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Preface 


I am happy for you to see this Fifth Edition of Introduction to Linear Algebra. 
This is the text for my video lectures on MIT’s OpenCourseWare (ocw.mit.edu and 
also YouTube). I hope those lectures will be useful to you (maybe even enjoyable !). 


Hundreds of colleges and universities have chosen this textbook for their basic linear 
algebra course. A sabbatical gave me a chance to prepare two new chapters about 
probability and statistics and understanding data. Thousands of other improvements too— 
probably only noticed by the author... Here is a new addition for students and all readers: 


Every section opens with a brief summary to explain its contents. When you 
read a new section, and when you revisit a section to review and organize 
it in your mind, those lines are a quick guide and an aid to memory. 


Another big change comes on this book’s website math.mit.edu/linearalgebra. That site 
now contains solutions to the Problem Sets in the book. With unlimited space, this is 
much more flexible than printing short solutions. There are three key websites : 


ocw.mit.edu Messages come from thousands of students and faculty about linear algebra 
on this OpenCourseWare site. The 18.06 and 18.06 SC courses include video lectures of 
a complete semester of classes. Those lectures offer an independent review of the whole 
subject based on this textbook—the professor’s time stays free and the student’s time can 
be 2 a.m. (The reader doesn’t have to be in a class at all.) Six million viewers around the 
world have seen these videos (amazing). I hope you find them helpful. 


web.mit.edu/18.06 This site has homeworks and exams (with solutions) for the current 
course as it is taught, and as far back as 1996. There are also review questions, Java demos, 
Teaching Codes, and short essays (and the video lectures). My goal is to make this book 
as useful to you as possible, with all the course material we can provide. 


math.mit.edu/linearalgebra This has become an active website. It now has Solutions 
to Exercises—with space to explain ideas. There are also new exercises from many dif- 
ferent sources—practice problems, development of textbook examples, codes in MATLAB 
and Julia and Python, plus whole collections of exams (18.06 and others) for review. 


Please visit this linear algebra site. Send suggestions to linearalgebrabook @ gmail.com 
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The Fifth Edition 


The cover shows the Four Fundamental Subspaces—the row space and nullspace are 
on the left side, the column space and the nullspace of AT are on the right. It is not usual 
to put the central ideas of the subject on display like this! When you meet those four spaces 
in Chapter 3, you will understand why that picture is so central to linear algebra. 

Those were named the Four Fundamental Subspaces in my first book, and they start 
from a matrix A. Each row of A is a vector in n-dimensional space. When the matrix 
has m rows, each column is a vector in m-dimensional space. The crucial operation in 
linear algebra is to take linear combinations of column vectors. This is exactly the result 
of a matrix-vector multiplication. Ax is a combination of the columns of A. 

When we take all combinations Ax of the column vectors, we get the column space. 
If this space includes the vector b, we can solve the equation Ax = b. 


May I call special attention to Section 1.3, where these ideas come early—with two 
specific examples. You are not expected to catch every detail of vector spaces in one day! 
But you will see the first matrices in the book, and a picture of their column spaces. 
There is even an inverse matrix and its connection to calculus. You will be learning the 
language of linear algebra in the best and most efficient way: by using it. 


Every section of the basic course ends with a large collection of review problems. They 
ask you to use the ideas in that section—-the dimension of the column space, a basis for 
that space, the rank and inverse and determinant and eigenvalues of A. Many problems 
look for computations by hand on a small matrix, and they have been highly praised. The 
Challenge Problems go a step further, and sometimes deeper. Let me give four examples: 


Section 2.1: Which row exchanges of a Sudoku matrix produce another Sudoku matrix? 
Section 2.7: If P is a permutation matrix, why is some power P* equal to I ? 
Section 3.4: If Ax = b and Cz = b have the same solutions for every b, does A equal C ? 


Section 4.1: What conditions on the four vectors r, n, c, £ allow them to be bases for 
the row space, the nullspace, the column space, and the left nullspace of a 2 by 2 matrix? 


The Start of the Course 


The equation Az = b uses the language of linear combinations right away. The vector 
Az is a combination of the columns of A. The equation is asking for a combination that 
produces b. The solution vector x comes at three levels and all are important: 


1. Direct solution to find x by forward elimination and back substitution. 
2. Matrix solution using the inverse matrix: z = AT tb (if A has an inverse). 
3. Particular solution (to Ay = b) plus nullspace solution (to Az = 0). 


That vector space solution x = y + z is shown on the cover of the book. 


Preface vii 


Direct elimination is the most frequently used algorithm in scientific computing. The 
matrix A becomes triangular—then solutions come quickly. We also see bases for the four 
subspaces. But don’t spend forever on practicing elimination ... good ideas are coming. 

The speed of every new supercomputer is tested on Ax = b: pure linear algebra. But 
even a Supercomputer doesn’t want the inverse matrix: too slow. Inverses give the simplest 
formula x = A~‘b but not the top speed. And everyone must know that determinants are 
even slower—there is no way a linear algebra course should begin with formulas for the 
determinant of an n by n matrix. Those formulas have a place, but not first place. 


Structure of the Textbook 


Already in this preface, you can see the style of the book and its goal. That goal is serious, 
to explain this beautiful and useful part of mathematics. You will see how the applications 
of linear algebra reinforce the key ideas. This book moves gradually and steadily from 
numbers to vectors to subspaces—each level comes naturally and everyone can get it. 


Here are 12 points about learning and teaching from this book: 


1. Chapter 1 starts with vectors and dot products. If the class has met them before, 
focus quickly on linear combinations. Section 1.3 provides three independent 
vectors whose combinations fill all of 3-dimensional space, and three dependent 
vectors in a plane. Those two examples are the beginning of linear algebra. 


2. Chapter 2 shows the row picture and the column picture of Ax = b. The heart of 
linear algebra is in that connection between the rows of A and the columns of A: 
the same numbers but very different pictures. Then begins the algebra of matrices: 
an elimination matrix Æ multiplies A to produce a zero. The goal is to capture 
the whole process—start with A, multiply by E’s, end with U. 


Elimination is seen in the beautiful form A = LU. The lower triangular L holds 
the forward elimination steps, and U is upper triangular for back substitution. 


3. Chapter 3 is linear algebra at the best level: subspaces. The column space contains 
all linear combinations of the columns. The crucial question is: How many of those 
columns are needed ? The answer tells us the dimension of the column space, and 
the key information about A. We reach the Fundamental Theorem of Linear Algebra. 


4. With more equations than unknowns, it is almost sure that Ax = b has no solution. 
We cannot throw out every measurement that is close but not perfectly exact! 
When we solve by least squares, the key will be the matrix ATA. This wonderful 
matrix appears everywhere in applied mathematics, when A is rectangular. 


5. Determinants give formulas for all that has come before—Cramer’s Rule, 
inverse matrices, volumes in n dimensions. We don’t need those formulas to com- 
pute. They slow us down. But det A = 0 tells when a matrix is singular: this is 
the key to eigenvalues. 
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Preface 


Section 6.1 explains eigenvalues for 2 by 2 matrices. Many courses want to see 
eigenvalues early. It is completely reasonable to come here directly from Chapter 3, 
because the determinant is easy for a 2 by 2 matrix. The key equation is Ax = Ax. 


Eigenvalues and eigenvectors are an astonishing way to understand a square matrix. 
They are not for Ax = b, they are for dynamic equations like du/dt = Au. 
The idea is always the same: follow the eigenvectors. In those special directions, 
A acts like a single number (the eigenvalue A) and the problem is one-dimensional. 


An essential highlight of Chapter 6 is diagonalizing a symmetric matrix. 
When all the eigenvalues are positive, the matrix is “positive definite”. This key 
idea connects the whole course—positive pivots and determinants and eigenvalues 
and energy. I work hard to reach this point in the book and to explain it by examples. 


. Chapter 7 is new. It introduces singular values and singular vectors. They separate 


all martices into simple pieces, ranked in order of their importance. You will see 
one way to compress an image. Especially you can analyze a matrix full of data. 


. Chapter 8 explains linear transformations. This is geometry without axes, algebra 


with no coordinates. When we choose a basis, we reach the best possible matrix. 


. Chapter 9 moves from real numbers and vectors to complex vectors and matrices. 


The Fourier matrix F is the most important complex matrix we will ever see. And 
the Fast Fourier Transform (multiplying quickly by F and F7?) is revolutionary. 


Chapter 10 is full of applications, more than any single course could need: 

10.1 Graphs and Networks—leading to the edge-node matrix for Kirchhoff’s Laws 
10.2 Matrices in Engineering—differential equations parallel to matrix equations 
10.3 Markov Matrices—as in Google’s PageRank algorithm 

10.4 Linear Programming—a new requirement x > 0 and minimization of the cost 
10.5 Fourier Series—linear algebra for functions and digital signal processing 

10.6 Computer Graphics—matrices move and rotate and compress images 


10.7 Linear Algebra in Cryptography—this new section was fun to write. The Hill 
Cipher is not too secure. It uses modular arithmetic: integers from 0 to p — 1. 
Multiplication gives 4 x 5 = 1(mod 19). For decoding this gives 47! = 5. 


How should computing be included in a linear algebra course? It can open a new 
understanding of matrices—every class will find a balance. MATLAB and Maple and 
Mathematica are powerful in different ways. Julia and Python are free and directly 
accessible on the Web. Those newer languages are powerful too ! 


Basic commands begin in Chapter 2. Then Chapter 11 moves toward professional al- 
gorithms. You can upload and download codes for this course on the website. 


Chapter 12 on Probability and Statistics is new, with truly important applications. 
When random variables are not independent we get covariance matrices. Fortunately 
they are symmetric positive definite. The linear algebra in Chapter 6 is needed now. 
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The Variety of Linear Algebra 


Calculus is mostly about one special operation (the derivative) and its inverse (the integral). 
Of course I admit that calculus could be important .... But so many applications of math- 
ematics are discrete rather than continuous, digital rather than analog. The century of data 
has begun! You will find a light-hearted essay called “Too Much Calculus” on my website. 
The truth is that vectors and matrices have become the language to know. 


Part of that language is the wonderful variety of matrices. Let me give three examples: 


Symmetric matrix Orthogonal matrix Triangular matrix 
2 -1 0 0 1 1 1 1 eel eet 
-1 2 -1 0 ee ee 1 -1l 0 Ixak 1 
0 =1 2 -l 2L 1 -1 -l 00>- 
0 0 =-1 2 1 -1 -l 1 0 070I 


A key goal is learning to “read” a matrix. You need to see the meaning in the numbers. 
This is really the essence of mathematics—patterns and their meaning. 

I have used italics and boldface to pick out the key words on each page. I know there 
are times when you want to read quickly, looking for the important lines. 


May I end with this thought for professors. You might feel that the direction is right, 
and wonder if your students are ready. Just give them a chance! Literally thousands of 
students have written to me, frequently with suggestions and surprisingly often with thanks. 
They know this course has a purpose, because the professor and the book are on their side. 
Linear algebra is a fantastic subject, enjoy it. 


Help With This Book 


The greatest encouragement of all is the feeling that you are doing something worthwhile 
with your life. Hundreds of generous readers have sent ideas and examples and corrections 
(and favorite matrices) that appear in this book. Thank you all. 


One person has helped with every word in this book. He is Ashley C. Fernandes, who 
prepared the IEX files. It is now six books that he has allowed me to write and rewrite, 
aiming for accuracy and also for life. Working with friends is a happy way to live. 


Friends inside and outside the MIT math department have been wonderful. Alan 
Edelman for Julia and much more, Alex Townsend for the flag examples in 7.1, and 
Peter Kempthorne for the finance example in 7.3: those stand out. Don Spickler’s website 
on cryptography is simply excellent. I thank Jon Bloom, Jack Dongarra, Hilary Finucane, 
Pavel Grinfeld, Randy LeVeque, David Vogan, Liang Wang, and Karen Willcox. 
The “eigenfaces” in 7.3 came from Matthew Turk and Jeff Jauregui. And the big step 
to singular values was accelerated by Raj Rao’s great course at Michigan. 


This book owes so much to my happy sabbatical in Oxford. Thank you, Nick Trefethen 
and everyone. Especially you the reader! Best wishes in your work. 
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Background of the Author 


This is my 9th textbook on linear algebra, and I hesitate to write about myself. It is the 
mathematics that is important, and the reader. The next paragraphs add something brief 
and personal, as a way to say that textbooks are written by people. 

I was born in Chicago and went to school in Washington and Cincinnati and St. Louis. 
My college was MIT (and my linear algebra course was extremely abstract). After that 
came Oxford and UCLA, then back to MIT for a very long time. I don’t know how many 
thousands of students have taken 18.06 (more than 6 million when you include the videos 
on ocw.mit.edu). The time for a fresh approach was right, because this fantastic subject 
was only revealed to math majors—we needed to open linear algebra to the world. 


I am so grateful for a life of teaching mathematics, more than I could possibly tell you. 


Gilbert Strang 


PS I hope the next book (2018 ?) will include Learning from Data. This subject is grow- 
ing quickly, especially “deep learning”. By knowing a function on a training set of old data, 
we approximate the function on new data. The approximation only uses one simple non- 
linear function f(x) = max(0, z). It is n matrix multiplications that we optimize to make 
the learning deep: xı = f (Ax + b1), £2 = f(Aow, + bo),...,%n = f(An®n-1 + bn). 
Those are n — 1 hidden layers between the input æ and the output z,, which approximates 
F(a) on the training set. 


THE MATRIX ALPHABET 


A Any Matrix P Permutation Matrix 

B Basis Matrix P Projection Matrix 

C — Cofactor Matrix Q = Orthogonal Matrix 

D Diagonal Matrix R Upper Triangular Matrix 
E Elimination Matrix R Reduced Echelon Matrix 
F Fourier Matrix S Symmetric Matrix 

H Hadamard Matrix T Linear Transformation 

I Identity Matrix U Upper Triangular Matrix 
J Jordan Matrix U Left Singular Vectors 

K Stiffness Matrix V Right Singular Vectors 
L Lower Triangular Matrix X  Eigenvector Matrix 

M Markov Matrix A Eigenvalue Matrix 

N _ Nullspace Matrix 5 Singular Value Matrix 


Chapter 1 


Introduction to Vectors 


The heart of linear algebra is in two operations—both with vectors. We add vectors to get 
v +w. We multiply them by numbers c and d to get cv and dw. Combining those two 
operations (adding cv to dw) gives the linear combination cv + dw. 


Linear combination w+dw=e| 7] +a] 5]=| i434 


Example v +w = | + | A | = | a | is the combination with c = d = 1 

Linear combinations are all-important in this subject! Sometimes we want one partic- 
ular combination, the specific choice c = 2 and d = 1 that produces cv + dw = (4,5). 
Other times we want all the combinations of v and w (coming from all c and d). 

The vectors cv lie along a line. When w is not on that line, the combinations cv + dw 
fill the whole two-dimensional plane. Starting from four vectors u,v, w,z in four- 
dimensional space, their combinations cu + dv + ew + fz are likely to fill the space— 
but not always. The vectors and their combinations could lie in a plane or on a line. 


Chapter 1 explains these central ideas, on which everything builds. We start with two- 
dimensional vectors and three-dimensional vectors, which are reasonable to draw. Then 
we move into higher dimensions. The really impressive feature of linear algebra is how 
smoothly it takes that step into n-dimensional space. Your mental picture stays completely 
correct, even if drawing a ten-dimensional vector is impossible. 

This is where the book is going (into n-dimensional space). The first steps are the 
operations in Sections 1.1 and 1.2. Then Section 1.3 outlines three fundamental ideas. 


1.1 Vector addition v + w and linear combinations cv + dw. 
1.2 The dot product v - w of two vectors and the length ||v|| = /v- v. 


1.3 Matrices A, linear equations Ax = b, solutions x = A~'b. 


2 Chapter 1. Introduction to Vectors 
1.1 Vectors and Linear Combinations 


1 3v + 5w is a typical linear combination cv + dw of the vectors v and w. 


1 2 OA ee iL 2| | aio |). | 13 
2 Forv =| 1] anda =| 5] thatcombinationis 3 | 7 ]+5]5]=| $445 [=| 8 | 


3 The vector ; | = 106 | + 3 | goes across to x = 2 and up to y = 3 in the zy plane. 


2 


4 The combinations | : | +a 3 


| fill the whole xy plane. They produce every y | : 


2 1 3 
5 The combinations c 3 | filla plane in zyz space. Same plane for | 1 |,| 4 
4 il 5 


Ca 2d = Il 1 
6 But c+3d=0 has no solution because its right side | O | is not on that plane. 
c+4d=0 0 


“You can’t add apples and oranges.” In a strange way, this is the reason for vectors. 
We have two separate numbers vı and v2. That pair produces a two-dimensional vector v: 


v vı = first component of v 
Column vector v v= | 1 | 1 P 


V2 v2 = second component of v 


We write v as a column, not as a row. The main point so far is to have a single letter v 
(in boldface italic) for this pair of numbers vı and v2 (in lightface italic). 

Even if we don’t add v; to v2, we do add vectors. The first components of v and w 
stay separate from the second components: 


VECTOR = U1 _ Wy T vı + Wy 
ADDITION y= a | and w= ioe | addto vt+w= | ens k 


Subtraction follows the same idea: The components of v — w are vı — w and v2 — w2. 
The other basic operation is scalar multiplication. Vectors can be multiplied by 2 or by 
—1 or by any number c. To find 2v, multiply each component of v by 2: 


SCALAR w- [a] Ee 
MULTIPLICATION e ee 


The components of cv are cv; and cv2. The number c is called a “scalar”. 

Notice that the sum of —v and v is the zero vector. This is 0, which is not the same as 
the number zero! The vector 0 has components 0 and 0. Forgive me for hammering away 
at the difference between a vector and its components. Linear algebra is built on these 
operations v + w and cv and dw—adding vectors and multiplying by scalars. 


1.1. Vectors and Linear Combinations 3 


Linear Combinations 


Now we combine addition with scalar multiplication to produce a “linear combination” 
of v and w. Multiply v by c and multiply w by d. Then add cv + dw. 


The sum of cv and dw is a linear combination cv + dw. 


Four special linear combinations are: sum, difference, zero, and a scalar multiple cv: 


lvu+lw = sum of vectors in Figure 1.la 
lvu—lw = difference of vectors in Figure 1.1b 
0v +0w = zero vector 

cv +0w = vector cv in the direction of v 


The zero vector is always a possible combination (its coefficients are zero). Every time we 
see a “space” of vectors, that zero vector will be included. This big view, taking all the 
combinations of v and w, is linear algebra at work. 

The figures show how you can visualize vectors. For algebra, we just need the com- 
ponents (like 4 and 2). That vector v is represented by an arrow. The arrow goes vı = 4 
units to the right and vg = 2 units up. It ends at the point whose æ, y coordinates are 4, 2. 
This point is another representation of the vector—so we have three ways to describe v: 


Represent vector v Two numbers Arrow from (0,0) Point in the plane 


We add using the numbers. We visualize v + w using arrows: 
Vector addition (head to tail) At the end of v, place the start of w. 


er ii 
sacii JE o-u = [3] - [F] = [o] 


Figure 1.1: Vector addition v + w = (3, 4) produces the diagonal of a parallelogram. 
The reverse of w is —w. The linear combination on the right is v — w = (5, 0). 


We travel along v and then along w. Or we take the diagonal shortcut along v + w. 
We could also go along w and then v. In other words, w + v gives the same answer as 
v + w. These are different ways along the parallelogram (in this example it is a rectangle). 


4 Chapter 1. Introduction to Vectors 


Vectors in Three Dimensions 


A vector with two components corresponds to a point in the zy plane. The components of v 
are the coordinates of the point: x = vı and y = v2. The arrow ends at this point (v1, v2), 
when it starts from (0,0). Now we allow vectors to have three components (v1, v2, U3). 

The zy plane is replaced by three-dimensional xyz space. Here are typical vectors 
(still column vectors but with three components): 


iji 2 3 
v= it and w= |3 and v+w= |4 
=] 4 3 


The vector v corresponds to an arrow in 3-space. Usually the arrow starts at the “origin”, 
where the xyz axes meet and the coordinates are (0,0,0). The arrow ends at the point 
with coordinates v1, v2, v3. There is a perfect match between the column vector and the 
arrow from the origin and the point where the arrow ends. 

The vector (x,y) in the plane is different from (x, y, 0) in 3-space ! 


it 
A a 
Figure 1.2: Vectors H and | y | correspond to points (x,y) and (z, y, z). 
z 
1 
From nowon v= | 1| isalso writtenas v = (1,1,—1). 
=l 


The reason for the row form (in parentheses) is to save space. But v = (1,1,—1) is 
not a row vector! Itis in actuality a column vector, just temporarily lying down. The row 
vector [1 1 —1] is absolutely different, even though it has the same three components. 
That 1 by 3 row vector is the “transpose” of the 3 by 1 column vector v. 


1.1. Vectors and Linear Combinations 5 


In three dimensions, v + w is still found a component at a time. The sum has 
components vı + w, and vg + wz and v3 + w3. You see how to add vectors in 4 or 5 
or n dimensions. When w starts at the end of v, the third side is v + w. The other way 
around the parallelogram is w + v. Question: Do the four sides all lie in the same plane? 
Yes. And the sum v + w — v — w goes completely around to produce the vector. 

A typical linear combination of three vectors in three dimensions is u + 4v — 2w: 


Linear combination 1 1 2 1 
Multiply by 1, 4, —2 0| +4);2) —2] 3| = 12 
Then add 3 1 —1 9 


The Important Questions 


For one vector u, the only linear combinations are the multiples cu. For two vectors, 
the combinations are cu + dv. For three vectors, the combinations are cu + dv + ew. 
Will you take the big step from one combination to all combinations? Every c and d and 
e are allowed. Suppose the vectors u, v, w are in three-dimensional space: 


1. What is the picture of all combinations cu? 
2. What is the picture of all combinations cu + dv? 
3. What is the picture of all combinations cu + dv + ew? 


The answers depend on the particular vectors u, v, and w. If they were zero vectors (a very 
extreme case), then every combination would be zero. If they are typical nonzero vectors 
(components chosen at random), here are the three answers. This is the key to our subject: 


1. The combinations cu fill a line through (0,0,0). 
2. The combinations cu + dv fill a plane through (0,0,0). 
3. The combinations cu + dv + ew fill three-dimensional space. 


The zero vector (0,0, 0) is on the line because c can be zero. It is on the plane because c 
and d could both be zero. The line of vectors cu is infinitely long (forward and backward). 
It is the plane of all cu + dv (combining two vectors in three-dimensional space) that 
I especially ask you to think about. 


Adding all cu on one line to all dv on the other line fills in the plane in Figure 1.3. 


When we include a third vector w, the multiples ew give a third line. Suppose that 
third line is not in the plane of u and v. Then combining all ew with all cu + dv fills up 
the whole three-dimensional space. 

This is the typical situation! Line, then plane, then space. But other possibilities exist. 
When w happens to be cu + dv, that third vector w is in the plane of the first two. 
The combinations of u,v, w will not go outside that uv plane. We do not get the full 
three-dimensional space. Please think about the special cases in Problem 1. 
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Plane from 
allcu+dv 


Line containing all cu 


(b) 


Figure 1.3: (a) Line through u. (b) The plane containing the lines through u and v. 


= REVIEW OF THE KEY IDEAS = 


. A vector v in two-dimensional space has two components v; and v2. 
. v +w = (vı + w, v2 + W2) and cv = (cv1, cv2) are found a component at a time. 


. A linear combination of three vectors u and v and w is cu + du + ew. 


> UOU N m 


. Take all linear combinations of u, or u and v, or u,v,w. In three dimensions, 
those combinations typically fill a line, then a plane, then the whole space R3. 


= WỌRKED EXAMPLES = 
1.1 A The linear combinations of v = (1,1,0) and w = (0,1,1) fill a plane in R3. 
Describe that plane. Find a vector that is not a combination of v and w—not on the plane. 


Solution The plane of v and w contains all combinations cv + dw. The vectors in that 
plane allow any c and d. The plane of Figure 1.3 fills in between the two lines. 


1 0 C 
Combinations cv +dw=c| 1 |+d| 1 |= | c+d | filla plane. 
0 I d 


Four vectors in that plane are (0,0,0) and (2,3,1) and (5,7,2) and (7, 27,7). 
The second component c + d is always the sum of the first and third components. 
Like most vectors, (1, 2,3) is not in the plane, because 2 #4 1 + 3. 

Another description of this plane through (0,0,0) is to know that n = (1, —1, 1) is 
perpendicular to the plane. Section 1.2 will confirm that 90° angle by testing dot products: 
v.n = Q and w » n = 0. Perpendicular vectors have zero dot products. 
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1.1B Forv = (1,0) and w = (0,1), describe all points cv with (1) whole numbers c 
(2) nonnegative numbers c > 0. Then add all vectors dw and describe all cv + dw. 


Solution 


(1) The vectors cv = (c,0) with whole numbers c are equally spaced points along the 
x axis (the direction of v). They include (—2, 0), (—1, 0), (0,0), (1, 0), (2,0). 


(2) The vectors cv with c > 0 fill a half-line. It is the positive x axis. This half-line 
starts at (0,0) where c = 0. It includes (100, 0) and (7, 0) but not (—100, 0). 


(1^) Adding all vectors dw = (0, d) puts a vertical line through those equally spaced cv. 
We have infinitely many parallel lines from (whole number c, any number d). 


(2’) Adding all vectors dw puts a vertical line through every cv on the half-line. Now we 


have a half-plane. The right half of the xy plane has any x > 0 and any y. 


1.1C Find two equations for c and d so that the linear combination cv + dw equals b: 


2 —1 1 
o e | Sheetal 
Solution In applying mathematics, many problems have two parts: 
1 Modeling part Express the problem by a set of equations. 


2 Computational part Solve those equations by a fast and accurate algorithm. 


Here we are only asked for the first part (the equations). Chapter 2 is devoted to the second 
part (the solution). Our example fits into a fundamental model for linear algebra: 


Find n numbers c1,...,Cn sothat civi +: + CnUn = D. 


For n = 2 we will find a formula for the c’s. The “elimination method” in Chapter 2 
succeeds far beyond n = 1000. For n greater than 1 billion, see Chapter 11. Here n = 2: 


Vector; equation >? €T 
cu F aw = b da) 


The required equations for c and d just come from the two components separately: 


, ; 26> d= 
Two ord t 
o ordinary equations e 
; . ' 2 1 
Each equation produces a line. The two lines cross at the solution c = 7 q= 3° Why not 


see this also as a matrix equation, since that is where we are going: 


: 2 —1 ceļ |1 
2 by 2 matrix E- Wa 
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Problem Set 1.1 
Problems 1-9 are about addition of vectors and linear combinations. 
1 Describe geometrically (line, plane, or all of R?) all linear combinations of 
1 3 1 0 2 0 2 
(a) |2| and |6 (b) 0| and |2 (c) O| and |2| and |2 
3 9 0 3 0 2 3 


2 Draw v = ji | and w = | E | and v+w and v— w in a single xy plane. 
5 1 
3 Ifv +w = 1 | and v — w = | 5 f compute and draw the vectors v and w. 


4 From v = | ; | and w = | : |. find the components of 3v + w and cv + dw. 


2 
5 Compute u + v + w and 2u + 2v + w. How do you know u, v, w lie in a plane? 
These lie in a plane because : =a A 
i u= |2]; v=] 1|, wei 
w = cu + dv. Find c and d 3 A 4 


6 Every combination of v = (1, —2, 1) and w = (0,1, —1) has components that add 
to . Find c and d so that cv + dw = (3,3, —6). Why is (3, 3, 6) impossible? 


7 In the xy plane mark all nine of these linear combinations: 


eli] +alt] with ¢=0,1,2 and d=, 1,2. 


8 The parallelogram in Figure 1.1 has diagonal v + w. What is its other diagonal? 
What is the sum of the two diagonals? Draw that vector sum. 


9 If three corners of a parallelogram are (1,1), (4,2), and (1,3), what are all three of 
the possible fourth corners? Draw two of them. 


Problems 10-14 are about special vectors on cubes and clocks in Figure 1.4. 


10 Which point of the cube is i + j? Which point is the vector sum of i = (1, 0, 0) and 
j = (0, 1,0) and k = (0,0, 1)? Describe all points (x, y, z) in the cube. 


11 Four corners of this unit cube are (0, 0,0), (1, 0,0), (0, 1,0), (0,0, 1). What are the 
other four corners? Find the coordinates of the center point of the cube. The center 
points of the six faces are . The cube has how many edges? 


12 Review Question. In xyz space, where is the plane of all linear combinations of 
i = (1,0,0) and į + j = (1,1,0)? 
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k=(0,0.1)_ j+k 


/ fy 


7 i 2:00 
Esh -6# | 
= (0, 1,0) 
l ' , Notice the illusion 
! '- Ts (0,0,0) a top or 
ee | -9 
A a bottom corner? 
Figure 1.4: Unit cube from i, j, k and twelve clock vectors. 
13 (a) What is the sum V of the twelve vectors that go from the center of a clock to 


the hours 1:00, 2:00, ..., 12:00? 
(b) If the 2:00 vector is removed, why do the 11 remaining vectors add to 8:00? 


(c) What are the x, y components of that 2:00 vector v = (cos 0, sin 0)? 


14 Suppose the twelve vectors start from 6:00 at the bottom instead of (0,0) at the 
center. The vector to 12:00 is doubled to (0, 2). The new twelve vectors add to . 


Problems 15-19 go further with linear combinations of v and w (Figure 1.5a). 
15 Figure 1.5a shows ł v+ ł w. Mark the points 3 v+ ; w and ; v+ i w and v +w. 


16 Mark the point —v + 2w and any other combination cv + dw with c + d = 1. 
Draw the line of all combinations that have c+ d = 1. 


17 Locate 3 v + i w and 2 v+ 4 w. The combinations cv + cw fill out what line? 
18 ~Restricted by 0 < c < 1 and 0 < d < 1, shade in all combinations cv + dw. 


19 Restricted only by c > 0 and d > 0 draw the “cone” of all combinations cv + dw. 


(a) 


Figure 1.5: Problems 15-19 in a plane Problems 20-25 in 3-dimensional space 
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Problems 20-25 deal with u, v, w in three-dimensional space (see Figure 1.5b). 


20 Locate ł u + ł v + ł w and ł u + ł w in Figure 1.5b. Challenge problem: Under 
what restrictions on c, d, e, will the combinations cu + dv + ew fill in the dashed 
triangle? To stay in the triangle, one requirement is c > 0,d > 0,e > 0. 


21 The three sides of the dashed triangle are v — u and w — v and u — w. Their sum is 
. Draw the head-to-tail addition around a plane triangle of (3, 1) plus (—1, 1) 
plus (—2, —2). 


22 Shade in the pyramid of combinations cu + dv + ew with c > 0,d > 0, e > 0 and 
c+d+e< 1. Mark the vector 5 (u +v + w) as inside or outside this pyramid. 


23 Ifyou look at all combinations of those u, v, and w, is there any vector that can’t be 
produced from cu + dv + ew? Different answer if u, v, w are all in 


24 Which vectors are combinations of u and v, and also combinations of v and w? 
25 Draw vectors u, v, w so that their combinations cu + dv + ew fill only a line. 


Find vectors u, v, w so that their combinations cu + dv + ew fill only a plane. 


26 What combination c A +d H produces 


equations for the coefficients c and d in the linear combination. 


| ? Express this question as two 


Challenge Problems 


27 How many corners does a cube have in 4 dimensions? How many 3D faces? 
How many edges? A typical corner is (0,0, 1,0). A typical edge goes to (0, 1, 0,0). 


28 Find vectors v and w so that v + w = (4,5,6) and v — w = (2,5,8). This is a 
question with unknown numbers, and an equal number of equations to find 
those numbers. 


29 Find two different combinations of the three vectors u = (1,3) and v = (2,7) and 
w = (1,5) that produce b = (0,1). Slightly delicate question: If I take any three 
vectors u, v, w in the plane, will there always be two different combinations that 
produce b = (0,1)? 


30 The linear combinations of v = (a,b) and w = (c, d) fill the plane unless 
Find four vectors u, v, w, z with four components each so that their combinations 
cu + dv + ew + fz produce all vectors (b1, b2, b3, b4) in four-dimensional space. 


31 Write down three equations for c, d, e so that cu + dv + ew = b. Can you somehow 
find c, d, e for this b ? 
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1.2 Lengths and Dot Products 


1 The “dot product” of v = | and w = | | is v . w = (1)(4) + (2)(5) = 4 + 10 = 14. 


1 
2 
4 
and w = | —4 | are perpendicular because v - w is zero: 
(1)(4) + (3)(—4) + (2)(4) = 0. 
1 


3 The length squared of v = | 3 | isv-v=1+9+4= 14. The length is ||v|| = v14. 
2 


A Tenus = : : has length ||u|| = 1 Checker rece Sa 1 
ol va i a “Mdm” 


lv] v14 


V -wW 


5 The angle 0 between v and w has cos? = ————. 
Ilol] Iwl] 


1 íl 1 
6 The angle between | | and | has cos 0 = ————. That angle is 0 = 45°. 
: : (1)(v2) 


7 All angles have | cos 6| < 1. So all vectors have | |v + w| < ||| I. 


The first section backed off from multiplying vectors. Now we go forward to define 
the “dot product” of v and w. This multiplication involves the separate products y w and 
U2W2, but it doesn’t stop there. Those two numbers are added to produce one number v » w. 

This is the geometry section (lengths of vectors and cosines of angles between them). 


The dot product or inner product of v = (v1, v2) and w = (w1, w2) is the number v- w: 


Ve W = VU1U 1 + VoW2. (1) 


Example 1 The vectors v = (4, 2) and w = (—1, 2) have a zero dot product: 


Dot product is zero H E 


Perpendicular vectors 9 g= —4+4=0. 


In mathematics, zero is always a special number. For dot products, it means that these 
two vectors are perpendicular. The angle between them is 90°. When we drew them 
in Figure 1.1, we saw a rectangle (not just any parallelogram). The clearest example of 
perpendicular vectors is i = (1, 0) along the x axis and 7 = (0, 1) up the y axis. Again the 
dot product is 2-7 = 0 + 0 = 0. Those vectors 2 and j form a right angle. 
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The dot product of v = (1,2) and w = (3,1) is 5. Soon v- w will reveal the angle 
between v and w (not 90°). Please check that w - v is also 5. 


The dot product w - v equals v - w. The order of v and w makes no difference. 


Example 2 Put a weight of 4 at the point x = —1 (left of zero) and a weight of 2 at the 
point x = 2 (right of zero). The x axis will balance on the center point (like a see-saw). 
The weights balance because the dot product is (4)(—1) + (2)(2) = 0. 

This example is typical of engineering and science. The vector of weights is (w1, w2) = 
(4,2). The vector of distances from the center is (v1, v2) = (—1, 2). The weights times the 
distances, wı vı and w2v2, give the “moments”. The equation for the see-saw to balance is 
W1V, + Wv = 0. 


Example 3 Dot products enter in economics and business. We have three goods to buy 
and sell. Their prices are (p1, p2,p3) for each unit—this is the “price vector” p. The 
quantities we buy or sell are (q1, q2, q3)—positive when we sell, negative when we buy. 
Selling qı units at the price pı brings in qıpı. The total income (quantities q times prices 
p) 1s the dot product q - pin three dimensions: 


Income = (q1, 92,93) * (P1, P2, P3) = qıPı + q2p2 + q3p3 = dot product. 


A zero dot product means that “the books balance”. Total sales equal total purchases if 
q:-p = 0. Then p is perpendicular to q (in three-dimensional space). A supermarket with 
thousands of goods goes quickly into high dimensions. 

Small note: Spreadsheets have become essential in management. They compute linear 
combinations and dot products. What you see on the screen is a matrix. 


Main point For v- w, multiply each v; times w;. Then v-w = viw +--+ + Unwn. 


Lengths and Unit Vectors 


An important case is the dot product of a vector with itself. In this case v equals w. 
When the vector is v = (1, 2, 3), the dot product with itself is v - v = ||v||? = 14: 


1 I 


Dot product v - v lol? = |2|- |2| =1+4+9=14. 
3 3 


Length squared 


Instead of a 90° angle between vectors we have 0°. The answer is not zero because v is not 
perpendicular to itself. The dot product v + v gives the length of v squared. 


DEFINITION The length ||v|| of a vector v is the square root of v + v: 


length = llv] = v-v = (Pte E 
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In two dimensions the length is Vv? + v3. In three dimensions it is V v? +s + 0%. 
By the calculation above, the length of v = (1, 2,3) is ||v|| = v14. 

Here ||v|| = yv - v is just the ordinary length of the arrow that represents the vector. 
If the components are | and 2, the arrow is the third side of a right triangle (Figure 1.6). 
The Pythagoras formula a? + b? = c? connects the three sides: 17 + 2? = |jv||?. 


For the length of v = (1,2,3), we used the right triangle formula twice. The vector 
(1, 2,0) in the base has length v5. This base vector is perpendicular to (0, 0, 3) that goes 
straight up. So the diagonal of the box has length ||v|| = V5 +9 = v14. 

The length of a four-dimensional vector would be v ui + us + vs + v2. Thus the vec- 
tor (1,1,1,1) has length v12? + 12? +12 +1? = 2. This is the diagonal through a unit 


cube in four-dimensional space. That diagonal in n dimensions has length y/n. 


(00,5) gpa o> = 
an ae (1, 2,3) has 
v-v = v? +03 + v3 ) length v14 
5 = 12422 ) 
14 = 12422432 : 
| 
(1, 0) 1 (0, 2,0) 
| (1, 2, 0) has 
(1,0,0) Fe sma length v5 


Figure 1.6: The length yv - v of two-dimensional and three-dimensional vectors. 


The word “unit” is always indicating that some measurement equals “one”. The unit 
price is the price for one item. A unit cube has sides of length one. A unit circle is a circle 
with radius one. Now we see the meaning of a “unit vector’. 


DEFINITION A unit vector u is a vector whose length equals one. Then u- u = 1. 


An example in four dimensions is u = (5, E, E, 5): Then u + wis ł + 1 + ; + I = 
We divided v = (1,1,1,1) by its length ||v|| = 2 to get this unit vector. 
Example 4 The standard unit vectors along the x and y axes are written 2 and 7. In the 
xy plane, the unit vector that makes an angle “theta” with the x axis is (cos 0, sin 8): 
‘ 1 a) _ |cosé 
Unit vectors tî = Hi and j= Hl and u= A ; 
When = 0, the horizontal vector u is 7. When 6 = 90° (or 7 radians), the vertical 
vector is J. At any angle, the components cos 0 and sin 0 produce u » u = 1 because 
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cos? §+ sin? 0 = 1. These vectors reach out to the unit circle in Figure 1.7. Thus cos 6 and 
sin 0 are simply the coordinates of that point at angle 0 on the unit circle. 


Since (2, 2,1) has length 3, the vector (2, 2 1) has length 1. Check that u u = 
44+ 44 4 = 1. Fora unit vector, divide any nonzero vector v by its length |||). 


Unit vector u=v/||v|| isa unit vector in the same direction as v. 
j=0.1) y= (1,1) 

= (45 Js) mele 

v2? V2) [ol 


=i i = (1,0) 


Unit vectors 


J 
Figure 1.7: The coordinate vectors 2 and j. The unit vector u at angle 45° (left) divides 
v = (1,1) by its length ||v|| = v2. The unit vector u = (cos 0, sin 0) is at angle 0. 


The Angle Between Two Vectors 


We stated that perpendicular vectors have v - w = 0. The dot product is zero when the 
angle is 90°. To explain this, we have to connect angles to dot products. Then we show 
how v - w finds the angle between any two nonzero vectors v and w. 


Right angles The dot product is v - w = 0 when v is perpendicular to w. 


Proof When v and w are perpendicular, they form two sides of a right triangle. 
The third side is v — w (the hypotenuse going across in Figure 1.8). The Pythagoras Law 
for the sides of a right triangle is a? + b? = c?: 


Perpendicular vectors ||v||? + ||w||? = |v — w||? (2) 
Writing out the formulas for those lengths in two dimensions, this equation is 
Pythagoras (v? + v3) + (w? + w3) = (v1 — wi)? + (v2 — w2)’. (3) 


The right side begins with v? — 2v,;w; + w?. Then v? and w? are on both sides of the equa- 
tion and they cancel, leaving —2v,;w,. Also v2 and we cancel, leaving —2v2w2. 
(In three dimensions there would be —2v3w3.) Now divide by —2 to see v — w = 0: 


0 = —2v,w, — 2veW2 which leads to vyw, + vew = 0. (4) 


Conclusion Right angles produce v - w = 0. The dot product is zero when the angle is 
0 = 90°. Then cos = 0. The zero vector v = O is perpendicular to every vector w 
because 0 - w is always zero. 
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Now suppose v - w is not zero. It may be positive, it may be negative. The sign of 
v > w immediately tells whether we are below or above a right angle. The angle is less than 
90° when v - w is positive. The angle is above 90° when v - w is negative. The right side 
of Figure 1.8 shows a typical vector v = (3, 1). The angle with w = (1, 3) is less than 90° 
because v - w = 6 is positive. 


A 
a=] v25 maj v-w=0 pea ee 


— == = angle with v 
v-w=0 angle with v less than 90 ° 
5+ 20 = 25 greater than 90 ° in this half-plane 


in this half-plane 


Figure 1.8: Perpendicular vectors have v - w = 0. Then ||v]||? + ||w||? = |v — wl”. 


The borderline is where vectors are perpendicular to v. On that dividing line between 
plus and minus, (1, —3) is perpendicular to (3, 1). The dot product is zero. 

The dot product reveals the exact angle 0. For unit vectors u and U, the sign of u- U 
tells whether 0 < 90° or 0 > 90°. More than that, the dot product u - U is the cosine of 
0. This remains true in n dimensions. 


Unit vectors u and U at angle 0 have w-U =cosé. Certainly |u-U| < 1. 


Remember that cos @ is never greater than 1. It is never less than —1. The dot product of 
unit vectors is between —1 and 1. The cosine of 0 is revealed by u - U. 

Figure 1.9 shows this clearly when the vectors are u = (cos 0, sin) and į = (1,0). 
The dot product is u - i = cos @. That is the cosine of the angle between them. 

After rotation through any angle a, these are still unit vectors. The vector i = (1,0) 
rotates to (cosa, sina). The vector u rotates to (cos G,sin 8) with 8 = a+ 0. Their 
dot product is cos a cos 6 + sina sin 3. From trigonometry this is cos(3 — a) = cos 8. 


baa, _ 
_ | cosé sin B 
ý eal 


Figure 1.9: Unit vectors: u + U is the cosine of 6 (the angle between). 
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What if v and w are not unit vectors? Divide by their lengths to get u = v/||v|| and 
U = w/||w||. Then the dot product of those unit vectors u and U gives cos 6. 


VW 
COSINE FORMULA If vand w are nonzero vectors then Tol lw = cost. (5) 
v|| |w 


Whatever the angle, this dot product of v/||v|| with w/||w|| never exceeds one. That 
is the “Schwarz inequality” |v - w| < ||v|| ||w]| for dot products—or more correctly the 
Cauchy-Schwarz-Buniakowsky inequality. It was found in France and Germany and 
Russia (and maybe elsewhere—it is the most important inequality in mathematics). 

Since | cos 0| never exceeds 1, the cosine formula gives two great inequalities: 


SCHWARZ INEQUALITY lv -w| < |jvl| lw 


TRIANGLE INEQUALITY lv + wl] < lioll + Iwl] 


Example 5 Find cosð for v = | and w = : | and check both inequalities. 


2 
Solution The dot productis v - w = 4. Both v and w have length v5. The cosine is 4/5. 
. 4 4 
cosi = sema 


lelle v55 5 

By the Schwarz inequality, v - w = 4 is less than ||v|| || w|| = 5. By the triangle inequality, 
side 3 = ||v + wl is less than side 1 + side 2. For v + w = (3,3) the three sides are 
V18 < /5+ v5. Square this triangle inequality to get 18 < 20. 


Example 6 The dot product of v = (a,b) and w = (b,a) is 2ab. Both lengths are 
Va? + b2. The Schwarz inequality v - w < ||v|| ||aw|| says that 2ab < a? + b?. 

This is more famous if we write x = a? and y = b?. The “geometric mean” Jry 
is not larger than the “arithmetic mean” = average I(x +y). 


Geometric < Arithmetic ie a? +6 
mean mean = 


becomes ,/ry < a 


Example 5 had a = 2 and b = 1. Sox = 4 and y = 1. The geometric mean ,/zy = 2 
is below the arithmetic mean $(1 +4) = 2.5. 


Notes on Computing 


MATLAB, Python and Julia work directly with whole vectors, not their components. 
When v and w have been defined, v + w is immediately understood. Input v and w 
as rows—the prime ’ transposes them to columns. 2v + 3w becomes 2 * v + 3 x w. 
The result will be printed unless the line ends in a semicolon. 
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MATLAB v=(2 3 4) ; w=[l 1 1 ; w=2ev+3%w 


The dot product v - w is a row vector times a column vector (use * instead of -): 


Instead of Hen we more often see [1 afal or ov *w 


The length of v is known to MATLAB as norm (v). This is sqrt(v’ * v). Then find the 
cosine from the dot product v’ x w and the angle (in radians) that has that cosine : 


Cosine formula cosine = v’ * w/(norm (v) * norm (w)) 
The arc cosine angle = acos (cosine) 


An M-file would create a new function cosine (v, w). Python and Julia are open source. 


= REVIEW OF THE KEY IDEAS =" 


1. The dot product v - w multiplies each component v; by w; and adds all v;w;. 
2. The length ||v|| is the square root of v-v. Then u =v/||v|| is a unit vector: length 1. 
3. The dot product is v - w = 0 when vectors v and w are perpendicular. 


4. The cosine of 0 (the angle between any nonzero v and w) never exceeds 1: 
Vw 
Cosine cos = ieia Schwarz inequality lv- w| < ||| ||]. 
v| |w 


™ WORKED EXAMPLES =" 


1.2A For the vectors v = (3,4) and w = (4,3) test the Schwarz inequality on v - w 
and the triangle inequality on ||v + w||. Find cos@ for the angle between v and w. 
Which v and w give equality |v + w|=||v|| || w|| and |v + wl|=l|jv]| + Iwl]? 


Solution The dot product is v + w = (3)(4) + (4)(3) = 24. The length of v is 
llvl| = V9 + 16 = 5 and also ||w|| = 5. The sum v + w = (7,7) has length 7v2 < 10. 

Schwarz inequality |v -w| < ||v|| wl] is 24 < 25. 

Triangle inequality lv + wl] < lloll + lwl] is 7/2 <54+5. 

Cosine of angle cos@ = 2% Thin angle from v = (3, 4) to w = (4,3) 
Equality: One vector is a multiple of the other as in w = cv. Then the angle is 0° or 180°. 
In this case | cos 80| = 1 and |v - w| equals ||v|| || w||. If the angle is 0°, as in w = 2v, then 
[v + w||=||v|| + ||w|| (both sides give 3]|v||). This v, 2v, 3v triangle is flat ! 
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1.2B Find a unit vector u in the direction of v = (3,4). Find a unit vector U that is 
perpendicular to u. How many possibilities for U? 


Solution For a unit vector u, divide v by its length ||v|| = 5. For a perpendicular vector 
V we can choose (—4, 3) since the dot product v - V is (3)(—4) + (4)(3) = 0. For a unit 
vector perpendicular to u, divide V by its length || V ||: 


v 3 =) V ( 4 5) 
ri -A a, ae es U = — = tS u-U —0 
lvl 5 IVI 55 


The only other perpendicular unit vector would be —U = ( i, = 3). 


1.2C Finda vector x = (c,d) that has dot products x-r = 1 anda-s = 0 with 
two given vectors r = (2, —1) and s = (—1, 2). 


Solution Those two dot products give linear equations for c and d. Then æ = (c,d). 


oer =1 is 2c— d=1 The same equations as 
x-s=0 is = €42¢=0 in Worked Example 1.1 C 


Comment on n equations for x = (£1,...,£n) in n-dimensional space 


Section 1.1 would start with columns vj. The goal is to produce £101 + +++ +2%nUn = b. 
This section would start from rows r;. Now the goal is to find æ with æ - r; = b;. 

Soon the v’s will be the columns of a matrix A, and the r’s will be the rows of A. 
Then the (one and only) problem will be to solve Ax = b. 


Problem Set 1.2 
1 Calculate the dot products u+ v and wu: w and u » (v + w) and w » v: 
s eb 
8 3 2 
2 Compute the lengths ||u|| and ||v|| and ||w|| of those vectors. Check the Schwarz 


inequalities |u + v| < ||ul]| ||v|| and |v - w| < ||u]| Iwl]. 


3 Find unit vectors in the directions of v and w in Problem 1, and the cosine of the 
angle 0. Choose vectors a, b, c that make 0°, 90°, and 180° angles with w. 


4 For any unit vectors v and w, find the dot products (actual numbers) of 
(a) v and —v (b) v+w andv -— w (c) v -— 2w andv + 2w 
5 Find unit vectors u; and uz in the directions of v = (1,3) and w = (2,1,2). 


Find unit vectors U; and U2 that are perpendicular to u; and uo. 
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6 


10 


11 


12 


13 


14 


15 


16 


17 


(a) Describe every vector w = (w1, w2) that is perpendicular to v = (2, —1). 
(b) All vectors perpendicular to V = (1,1,1) lieona ______ in 3 dimensions. 


(c) The vectors perpendicular to both (1,1, 1) and (1, 2,3) lie ona 


Find the angle 6 (from its cosine) between these pairs of vectors: 


1 1 2 2 
@ v= |z and w= [3] b) v= t and w= 2 


@v=| iq and w=| 75 (d) v= fl and Ba] 


True or false (give a reason if true or find a counterexample if false): 


(a) If u = (1,1,1) is perpendicular to v and w, then v is parallel to w. 
(b) If u is perpendicular to v and w, then u is perpendicular to v + 2w. 
(c) If u and v are perpendicular unit vectors then ||u — v|| = v2. Yes! 
The slopes of the arrows from (0,0) to (v1, v2) and (w1, w2) are v2 /vı and w2/w1. 


Suppose the product v2w2/v,w, of those slopes is —1. Show that v - w = 0 and 
the vectors are perpendicular. (The line y = 4x is perpendicular to y = —żz.) 


Draw arrows from (0,0) to the points v = (1,2) and w = (—2, 1). Multiply their 
slopes. That answer is a signal that v - w = 0 and the arrows are 


If v » w is negative, what does this say about the angle between v and w? Draw a 
3-dimensional vector v (an arrow), and show where to find all w’s with v - w < 0. 


With v = (1,1) and w = (1,5) choose a number c so that w — cv is perpendicular 
to v. Then find the formula for c starting from any nonzero v and w. 


Find nonzero vectors v and w that are perpendicular to (1,0,1) and to each other. 
Find nonzero vectors u, v, w that are perpendicular to (1,1,1,1) and to each other. 


The geometric mean of x = 2 and y = 8 is ,/xy = 4. The arithmetic mean is larger: 
t(x +y) = . This would come in Example 6 from the Schwarz inequality for 


v = (V2, V8) and w = (V8, V2). Find cos @ for this v and w. 


How long is the vector v = (1,1,...,1) in 9 dimensions? Find a unit vector u in 
the same direction as v and a unit vector w that is perpendicular to v. 


What are the cosines of the angles a, 8, 0 between the vector (1,0, —1) and the unit 
vectors į, j, k along the axes? Check the formula cos? a + cos? 3 + cos? 6 = 1. 
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Problems 18-28 lead to the main facts about lengths and angles in triangles. 


18 The parallelogram with sides v = (4,2) and w = (—1, 2) is a rectangle. Check the 
Pythagoras formula a? + b? = c? which is for right triangles only: 


(length of v)? + (length of w)? = (length of v + w)’. 


19 (Rules for dot products) These equations are simple but useful: 
Qv-w=w-v (2)u-(v+w)=u-vt+u-w (3)(cv).w = cv. w) 


Use (2) with u = v + w to prove |v + w|? =v:v + 2u-w+w-w. 
20 The “Law of Cosines” comes from (v — w) + (v — w) = v -v — Ww . w + w » w: 
Cosine Law lv — wll? = |v]? — 2||a] lwl] cos8 + || w||?. 
Draw a triangle with sides v and w and v — w. Which of the angles is 0 ? 


21 The triangle inequality says: (length of v + w). < (length of v) + (length of w). 
Problem 19 found |v + w|? = ||v||? + 2v -w + |jwl|?. Increase that v - w to 
||2|| ||2|| to show that ||side 3|| can not exceed ||side 1|| + ||side 2||: 


Triangle 


2 2 
inequality (2 +e <(llell + loll)” or St ee eee 


w = (w1, w2) 


v +w 


v = (vj, v2) 


22 The Schwarz inequality |v - w| < ||v|| ||w|| by algebra instead of trigonometry: 


(a) Multiply out both sides of (vjwi + v2w2)? < (v? + v3) (w? + w3). 


(b) Show that the difference between those two sides equals (viw — vw)’. 
This cannot be negative since it is a square—so the inequality is true. 


23 The figure shows that cosa = vı/||v|| and sina = v2/||v||. Similarly cos £ is 
and sin £ is . The angle 0 is 6 — a. Substitute into the trigonometry 
formula cos 6 cosa + sin £ sin a for cos(8 — a) to find cos 0 = v + w/||v|| w|]. 
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24 


25 
26 


27 


28 


29 


30 


31 


32 
33 
34 


One-line proof of the inequality |w-U| < 1 for unit vectors (u1, u2) and (U1, U2): 
u? + U? p us + UŽ 
2 2 
Put (wi, U2) = (.6,.8) and (U1, U2) = (.8, .6) in that whole line and find cos 6. 


[u -U| < lua] [Ui] + [u2] [U2] < =1. 


Why is | cos 0| never greater than 1 in the first place? 
(Recommended) Draw a parallelogram 


Parallelogram with two sides v and w. Show that the squared diagonal lengths ||v + 
w|? + lv — wil? add to the sum of four squared 
side lengths 2||v||? + 2||2w/|?. 


If v = (1, 2) draw all vectors w = (x,y) in the zy plane with v- w = z + 2y = 5. 
Why do those w’s lie along a line? Which is the shortest w? 


(Recommended) If ||v|| = 5 and ||w|| = 3, what are the smallest and largest possible 


values of ||v — w||? What are the smallest and largest possible values of v - w? 


Challenge Problems 


Can three vectors in the zy plane have u «v < Oandv-w < Oandu-w < 0? 
I don’t know how many vectors in xyz space can have all negative dot products. 
(Four of those vectors in the plane would certainly be impossible .. .). 


Pick any numbers that add to x + y + z = 0. Find the angle between your vec- 
tor v = (x,y,z) and the vector w = (z,2,y). Challenge question: Explain why 
v+w/||v||||wl| is always — 4. 


How could you prove 3/ryz < t(x +y +z) (geometric mean < arithmetic mean ) ? 
Find 4 perpendicular unit vectors of the form (+: E, +4, E 5) Choose + or —. 


Using v = randn(3, 1) in MATLAB, create a random unit vector u = v/||v||. Using 
V = randn(3, 30) create 30 more random unit vectors Uj. What is the average size 


of the dot products |u - U;|? In calculus, the average is Te | cos 0|d0/m = 2/n. 
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1.3 Matrices 


/ il 


2 
1 A= | 3 4 | isa3 by 2 matrix: m = 3 rows and n = 2 columns. 
5 6 
B2 1 2 
ZAG] | 3 4 | is a combination of the columns Ag = zı | 3 |+r2 | 4 
w 5 6 


3 The 3 components of Ax are dot products of the 3 rows of A with the vector x : 


1-2 7 1-74+2-8 23 
Row at a time 3 4 [l= 3°7+4-8 | =| 53 
5 6 5-7+6-8 83 


: f : = 2 oO Ot) | ee by 2%, +922 = bı 
4 Equations in matrix form Ax = b: | 3 7 | | 1 | = | bs | replaces ee eee 


NS The solution to Ax = b can be written as x = A` tb. But some matrices don’t allow A~. at 


This section starts with three vectors u, v, w. I will combine them using matrices. 


1 0 0 
Three vectors u= | —1 ü= 1 w= | 0 
0 —1 1 


Their linear combinations in three-dimensional space are xı u + %2U + £z3w: 


Combination 7 N l i 7 Tı 
of the vectors a + £2 + 23 = | 2-2, |. (D) 
0 =l 1 T3 — T2 


Now something important: Rewrite that combination using a matrix. The vectors u, v, w 
go into the columns of the matrix A. That matrix “multiplies” the vector (x1, £2, £3): 


Matrix times vector Mes a ; o Tı 7 Tı 7 as 
Combination of columns mre v2 | = | T2 — Tı ot ik 
0 —I 1 T3 T3 — T2 


The numbers x1, £2, £3 are the components of a vector x. The matrix A times the vector x 
is the same as the combination 7,u + xov + %3w of the three columns in equation (1). 


This is more than a definition of Ax, because the rewriting brings a crucial change 
in viewpoint. At first, the numbers x), 22,73 were multiplying the vectors. Now the 
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matrix is multiplying those numbers. The matrix A acts on the vector x. The output 
Az is a combination b of the columns of A. 
To see that action, I will write b1, b2, b3 for the components of Ax : 


il 0 0 LI Tı by 
Ax = | —1 1 0 So | = | r-t |= | b | =b. (3) 
0 -1 1 T3 T3 — T2 b3 


The input is x and the output is b = Az. This A is a “difference matrix” because b 
contains differences of the input vector æ. The top difference is x, — zo = zı — 0. 


Here is an example to show differences of x = (1,4,9): squares in x, odd numbers in b. 
1 i= 1 
az = |4| =squares Az = |4—1| = |3| =b. (4) 
9 9—4 5 


That pattern would continue for a 4 by 4 difference matrix. The next square would be 
x4 = 16. The next difference would be z4 — x3 = 16 — 9 = 7 (the next odd number). 
The matrix finds all the differences 1, 3, 5, 7 at once. 


Important Note: Multiplication a row at a time. You may already have learned about 
multiplying Aw, a matrix times a vector. Probably it was explained differently, using the 
rows instead of the columns. The usual way takes the dot product of each row with æ: 


Az is also 1 0 0 Tı (1, 0, 0)» (£1, £2, £3) 
dot products Az = | —1 1 0 £2 | = | (—1,1,0) » (£1, £2, £3) | - (5) 
with rows 0 -1 1 T3 (0, —1, 1) - (£1, £2, £3) 


Those dot products are the same xı and z2 — xı and z3 — x2 that we wrote in equation (3). 
The new way is to work with Ax a column at a time. Linear combinations are the key to 
linear algebra, and the output Az is a linear combination of the columns of A. 

With numbers, you can multiply Ax by rows. With letters, columns are the good way. 
Chapter 2 will repeat these rules of matrix multiplication, and explain the ideas. 


Linear Equations 


One more change in viewpoint is crucial. Up to now, the numbers 21, £2, 73 were known. 
The right hand side b was not known. We found that vector of differences by multiplying 
A times x. Now we think of b as known and we look for zx. 


Old question: Compute the linear combination z;u + xov + x3 to find b. 
New question: Which combination of u, v, w produces a particular vector b? 


This is the inverse problem—to find the input æ that gives the desired output b = Az. 
You have seen this before, as a system of linear equations for 21, 72, 73. The right hand 
sides of the equations are bj, b2, b3. I will now solve that system Ax = b to find x1, £2, £3: 
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Ti = b Ti = bi 
Equations Solution 
eae —%+ 22 = bo Liver z2 = b1 + b2 (6) 
— £2 + £3 = b3 z3 = bı + b2 + b3. 


Let me admit right away—most linear systems are not so easy to solve. In this example, 

the first equation decided zı = bı. Then the second equation produced xz = bı + bo. 

The equations can be solved in order (top to bottom) because A is a triangular matrix. 
Look at two specific choices 0, 0, 0 and 1, 3, 5 of the right sides b1, b2, bs: 


0 0 1 I 1 
b= |0] gives x= |0 b= |3| gives x= | 143 = |4 
0 0 5 1+3+5 9 


The first solution (all zeros) is more important than it looks. In words: Jf the output is 
b = 0, then the input must be x = 0. That statement is true for this matrix A. It is not true 
for all matrices. Our second example will show (for a different matrix C) how we can have 
Cx = 0 when C Æ 0 and z £ 0. 

This matrix A is “invertible”. From b we can recover x. We write x as A~! b. 


The Inverse Matrix 


Let me repeat the solution x in equation (6). A sum matrix will appear! 


Aa = bis solved by | z2 | = | bı + b2 =) 1 1 -Q bo |. (7) 
T3 bi + b2 + bg 1- f 1 b3 


If the differences of the x’s are the b’s, the sums of the b’s are the x’s. That was true for 

the odd numbers b = (1,3,5) and the squares x = (1,4,9). It is true for all vectors. 

The sum matrix in equation (7) is the inverse A~' of the difference matrix A. 
Example: The differences of x = (1, 2,3) are b = (1,1,1). Sob = Ag and x = A~'b: 


1 0 0 1 1 1 0 0 1 1 
Ax=|—-1 1 0 2/=/1 Atb=|]1 1 0 1}=| 2 
0 -1 ł 3 1 1 1 1 1 3 


Equation (7) for the solution vector x = (x, £2, £3) tells us two important facts: 
1. For every b there is one solution to Az = b. 2. The matrix A~! produces x = A~!b. 


The next chapters ask about other equations Ax = b. Is there a solution? How to find it? 


Note on calculus. Letme connect these special matrices to calculus. The vector æ changes 
to a function x(t). The differences Ax become the derivative dx/dt = b(t). In the 
inverse direction, the sums A~ 1b become the integral of b(t). Sums of differences are like 
integrals of derivatives. 
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The Fundamental Theorem of Calculus says : integration is the inverse of differentiation . 


d t 
Ax = band x = A-'b i = b and z(t) = | bdt. (8) 
0 


The differences of squares 0, 1, 4, 9 are odd numbers 1, 3, 5. The derivative of x(t) = t? 
is 2t. A perfect analogy would have produced the even numbers b = 2,4,6 at times 
t = 1,2, 3. But differences are not the same as derivatives, and our matrix A produces not 
2t but 2t — 1: 


Backward z(t)—z(t-1)=¢ —(t-1)} = #? — (t -2t+1)=2t-1. (9) 


The Problem Set will follow up to show that “forward differences” produce 2t + 1. 
The best choice (not always seen in calculus courses) is a centered difference that uses 
x(t + 1) — z(t — 1). Divide that Az by the distance At from ¢ — 1 to ¢ + 1, which is 2: 


t+1)? =(¢-1)? 
Centered difference of x(t) = t? ae 


Difference matrices are great. Centered is the best. Our second example is not invertible. 


= 2t exactly. (10) 


Cyclic Differences 


This example keeps the same columns u and v but changes w to a new vector w*: 


jl 0 —1 
Second example u= | —1 C= 1 w= 0 
0 —1 1 


Now the linear combinations of u, v, w* lead to a cyclic difference matrix C: 


I 0 -1 Ly Tı — T3 
Cyclic Ca= | —1 1 0 xo | =E | z2 — zı | =b. (11) 
0 -l 1 T3 £3 — 29 


This matrix C is not triangular. It is not so simple to solve for æ when we are given b. 
Actually it is impossible to find the solution to Ca = b, because the three equations either 
have infinitely many solutions (sometimes) or else no solution (usually) : 


Ca =0 Tı — 23 0 Tı 
Infinitely z2 — zı | = | O | is solved by all vectors | z2 | = | c|. (12) 
many zx T3 — T2 0 T3 


Every constant vector like £ = (3,3,3) has zero differences when we go cyclically. The 
undetermined constant c is exactly like the + C that we add to integrals. The cyclic dif- 
ferences cycle around to zı — x3 in the first component, instead of starting from zo = 0. 
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The more likely possibility for Cæ = b is no solution z at all: 


£1 — T3 1 Left sides add to 0 
Cx =b t2- zı | =]3 Right sides add to 9 (13) 
T3 — T2 5 No solution £1, £2, £3 


Look at this example geometrically. No combination of u,v, and w* will produce the 
vector b = (1,3,5). The combinations don’t fill the whole three-dimensional space. 
The right sides must have bı + b2 + b3 = 0 to allow a solution to Cæ = b, because 
the left sides 71 — £3, £2 — £1, and z3 — £2 always add to zero. Put that in different words: 


All linear combinations zı u + x2v + zzw* lie on the plane given by b; + b2 + b3 = 0. 
This subject is suddenly connecting algebra with geometry. Linear combinations can fill all 


of space, or only a plane. We need a picture to show the crucial difference between u, v, w 
(the first example) and u, v, w* (all in the same plane). 


3 

0 

w=] 0 

1 

2 
1 

u=| -1 v = 1 
l =i 


Figure 1.10: Independent vectors u, v, w. Dependent vectors u, v, w* in a plane. 


Independence and Dependence 


Figure 1.10 shows those column vectors, first of the matrix A and then of C. The first two 
columns u and v are the same in both pictures. If we only look at the combinations of 
those two vectors, we will get a two-dimensional plane. The key question is whether the 
third vector is in that plane: 


Independence w is not in the plane of u and v. 
Dependence w* is in the plane of u and v. 


The important point is that the new vector w* is a linear combination of u and v: 


u+v+w* =0 w= 0 | =-u-v. (14) 


1.3. Matrices 27 


All three vectors u,v, w* have components adding to zero. Then all their combinations 
will have bı + b2 + b3 = 0 (as we saw above, by adding the three equations). This is the 
equation for the plane containing all combinations of u and v. By including w* we get 
no new vectors because w* is already on that plane. 

The original w = (0,0, 1) is not on the plane: 0 + 0 + 1 4 0. The combinations of 
u, v, w fill the whole three-dimensional space. We know this already, because the solution 
x = A~'b in equation (6) gave the right combination to produce any b. 

The two matrices A and C’, with third columns w and w*, allowed me to mention two 
key words of linear algebra: independence and dependence. The first half of the course will 
develop these ideas much further—I am happy if you see them early in the two examples: 


U,V, w are independent. No combination except Ou + 0v + Ow = 0 gives b = 0. 
u,v, w* are dependent. Other combinations like u + v + w* give b = 0. 


You can picture this in three dimensions. The three vectors lie in a plane or they don’t. 
Chapter 2 has n vectors in n-dimensional space. Independence or dependence is the key 
point. The vectors go into the columns of an n by n matrix: 


Independent columns: Aæ = 0 has one solution. A is an invertible matrix. 


Dependent columns: Cx = 0 has many solutions. C is a singular matrix. 


Eventually we will have n vectors in m-dimensional space. The matrix A with those n 
columns is now rectangular (m by n). Understanding Ax = b is the problem of Chapter 3. 


= REVIEW OF THE KEY IDEAS m 


1. Matrix times vector: Ax = combination of the columns of A. 
2. The solution to Ax = bis x = A~!b, when A is an invertible matrix. 


3. The cyclic matrix C has no inverse. Its three columns lie in the same plane. 
Those dependent columns add to the zero vector. Ca = 0 has many solutions. 


4. This section is looking ahead to key ideas, not fully explained yet. 


=@ WORKED EXAMPLES =m 


1.3 A Change the southwest entry agı of A (row 3, column 1) to agı = 1: 


1 0 0 Ëi ey by 
Ax =b —1 1 0 T2 = | =%1 FE = | b 
1 —1 1 T3 Z1—%+ 23 bs 


Find the solution z for any b. From x = A~!b read off the inverse matrix A~!. 
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Solution Solve the (linear triangular) system Ax = b from top to bottom: 


first zı = b1 1 0 0 bı 
then zə = bı + bə This says that x = A~'b=| 1 1 0 | be 9 
then z3 = bo + bs 0 she 1 l3 J 


This is good practice to see the columns of the inverse matrix multiplying bı, b2, and b3. 
The first column of A~? is the solution for b = (1, 0,0). The second column is the solution 
for b = (0,1,0). The third column x of A~! is the solution for Ax = b = (0,0, 1). 

The three columns of A are still independent. They don’t lie in a plane. The combi- 
nations of those three columns, using the right weights x1, £2, £3, can produce any three- 
dimensional vector b = (bj, b2, b3). Those weights come from z = A~!b. 


1.3 B This E is an elimination matrix. E has a subtraction and E~! has an addition. 
= by] ry =o) By a ahaa cl) ie 
pee se velo alle, p=| 1] 


The first equation is xı = bı. The second equation is £9 — £21; = bg. The inverse will add 
£b: to bg, because the elimination matrix subtracted : 


a p-l Tij by E 1 0 bi = EE 1 0 
a Pa ek =| £ aa = ll ee a 


1.3 C Change C from a cyclic difference to a centered difference producing z3 — 2: 


0O 1 0 £i fo = 0 by 
Cx =b —1 0 1 £2 = £3 — LY = bo : (15) 
0 —i 0 £3 0 — T2 bs 


Ca = b can only be solved when b1 + b3 = x2 — z2 = 0. That is a plane of vectors b 
in three-dimensional space. Each column of C is in the plane, the matrix has no inverse. 
So this plane contains all combinations of those columns (which are all the vectors Cæ). 


I included the zeros so you could see that this C produces “centered differences”. 
Row i of Ca is x44, (right of center) minus z;—1 (left of center). Here is 4 by 4: 


Cz =b : 7 i ; Tı T2 g 0 z 
Centered E T2 | _ | 73 =T | _ |02 (16) 
differences 0-1 0 1 T3 T4 — LQ b3 

0 0 .<2 0 T4 0 — z3 b4 


Surprisingly this matrix is now invertible! The first and last rows tell you xg and z3. 
Then the middle rows give x, and xq. It is possible to write down the inverse matrix C71. 
But 5 by 5 will be singular (not invertible) again ... 
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Problem Set 1.3 


1 


Find the linear combination 3s; + 4s9 + 583 = b. Then write b as a matrix-vector 
multiplication Sax, with 3,4, 5 in x. Compute the three dot products (row of S)» x: 


1 0 0 
s= | l S&Ss= I s3 = | 0 | go into the columns of S. 
1 1 1 


Solve these equations Sy = b with sj, S2, S3 in the columns of S: 


1 1 0 yo|=|]1] and}|1 1 0 yo|=|4 
111 Y3 1 1 1 14] | ys 9 


S is a sum matrix. The sum of the first 5 odd numbers is 


Solve these three equations for y1 , y2, y3 in terms of c1, C2, C3: 


1 0 0 Yı C1 
Sy =C 1 1 0 Y2 = | & 
1 1 1 Y3 C3 


Write the solution y as a matrix A = S7! times the vector c. Are the columns of S 
independent or dependent? 


Find a combination zı Ww, + 22W2 + £3W3 that gives the zero vector with zı = 1: 


1 4 7 
WwW, = 2 W2 = 5 W3 = 8 
3 6 9 


Those vectors are (independent) (dependent). The three vectors lie in a 
The matrix W with those three columns is not invertible. 


The rows of that matrix W produce three vectors (J write them as columns): 


1 2 3 
Ti = 4 T2 = 5 T3 = 6 
7 8 9 


Linear algebra says that these vectors must also lie in a plane. There must be many 
combinations with y1rı + yer2 + y3r3 = 0. Find two sets of y’s. 


Which numbers c give dependent columns so a combination of columns equals zero ? 


1 1 0 L 0 g c c c| maybe 
3 2 1 1 1 0 2 1 5| always 
7 4 c 0 1 1 3 3 6 | independent for c 40? 
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If the columns combine into Ax = O then each of the rows has r- x = 0: 


Ly 0 T1°2 0 
ai a2 a3 Poe ee O By rows | r2°x | = | 0 
T3 0 T3 £ 0 


The three rows also lie in a plane. Why is that plane perpendicular to x? 


Moving to a 4 by 4 difference equation Ax = b, find the four components 71, £2, 
£3, £4. Then write this solution as x = AT tb to find the inverse matrix : 


1 0 0 0 L1 by 
= ed 1 0 0 eo | 1 de | - 
ag e tees alg | 
0 0 —1 1 T4 ba 


What is the cyclic 4 by 4 difference matrix C ? It will have 1 and —1 in each row and 
each column. Find all solutions x = (£1, £2, £3, £4) to Cx = 0. The four columns 
of C lie in a “three-dimensional hyperplane” inside four-dimensional space. 


A forward difference matrix A is upper triangular: 


-1 1 olfa 23> Zi by 
Az = 0 —1 1| | z2 | = | ee z2 | = | b | =b. 
0 0 =1 £3 0 — 23 b3 


Find 21, 22, z3 from b1, b2, bz. What is the inverse matrix in z = A7! b? 


Show that the forward differences (t + 1)? — t? are 2t+1 = odd numbers. 
As in calculus, the difference (t + 1)” — t” will begin with the derivative of t”, 
which is 


The last lines of the Worked Example say that the 4 by 4 centered difference matrix 
in (16) is invertible. Solve Cæ = (b1, b2, b3, ba) to find its inverse in x = C7! b. 


Challenge Problems 


The very last words say that the 5 by 5 centered difference matrix is not invertible. 
Write down the 5 equations Cz = b. Find a combination of left sides that gives 
zero. What combination of b1, b2, b3, b4, bs must be zero? (The 5 columns lie on a 
“‘4-dimensional hyperplane” in 5-dimensional space. Hard to visualize.) 


If (a,b) is a multiple of (c, d) with abcd Æ 0, show that (a,c) is a multiple of (b, d). 
This is surprisingly important; two columns are falling on one line. You could use 
numbers first to see how a, b, c, d are related. The question will lead to: 


If | i : | has dependent rows, then it also has dependent columns. 


Chapter 2 


Solving Linear Equations 


2.1 Vectors and Linear Equations 


1 The column picture of Ax = b: a combination of n columns of A produces the vector b. 
2 This is a vector equation Ax = 71a; +---+2na, = b: the columns of A are a1, 42,..., Qn. 
3 When b = 0, a combination Az of the columns is zero : one possibility is x = (0,...,0). 


4 The row picture of Ax = b : m equations from m rows give m planes meeting at x. 


5 A dot product gives the equation of each plane: (row 1) - a = bı, ...,(rowm) -£ = bm. 


6 When b = 0, all the planes (row i) - x = 0 go through the center point æ = (0,0, ..., 0). 


The central problem of linear algebra is to solve a system of equations. Those equations 
are linear, which means that the unknowns are only multiplied by numbers—we never see 
x times y. Our first linear system is small. But you will see how far it leads: 


Two equations Lm 2y & 


Two unknowns $n + Qy = ll (1) 


We begin a row at a time. The first equation x — 2y = 1 produces a straight line in the 
xy plane. The point x = 1, y = 0 is on the line because it solves that equation. The point 
x = 3,9 = 1 is also on the line because 3 = 2 = 1. If we choose x = 101 we find y = 50. 

The slope of this particular line is 4, because y increases by 1 when x changes by 2. 
But slopes are important in calculus and this is linear algebra! 

Figure 2.1 will show that first line x — 2y = 1. The second line in this “row picture” 
comes from the second equation 3x + 2y = 11. You can’t miss the point x = 3,y = 1 
where the two lines meet. That point (3, 1) lies on both lines and solves both equations. 
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Figure 2.1: Row picture: The point (3, 1) where the lines meet solves both equations. 


ROWS The row picture shows two lines meeting at a single point (the solution). 


Turn now to the column picture. I want to recognize the same linear system as a “vector 
equation”. Instead of numbers we need to see vectors. If you separate the original system 
into its columns instead of its rows, you get a vector equation: 


ee 1 =—2) | 2} 
Combination equals b 7 3 | +y] r | = | 11 | =b. (2) 


This has two column vectors on the left side. The problem is to find the combination of 
those vectors that equals the vector on the right. We are multiplying the first column by 
x and the second column by y, and adding. With the right choices x = 3 and y = 1 (the 
same numbers as before), this produces 3 (column 1) + 1 (column 2) = b. 


combines the column vectors on the left side to 


Figure 2.2 is the “column picture” of two equations in two unknowns. The first part 
shows the two separate columns, and that first column multiplied by 3. This multiplication 
by a scalar (a number) is one of the two basic operations in linear algebra: 


Scalar multiplication 3 | | = | > | : 
If the components of a vector v are vı and v2, then cv has components cv, and cvs. 
The other basic operation is vector addition. We add the first components and the 
second components separately. The vector sum is (1, 11), the desired vector b. 
waa 3 —2 1 
Vector addition 9 | + | 9 | = | ll | ; 


The right side of Figure 2.2 shows this addition. Two vectors are in black. The sum along 
the diagonal is the vector b = (1,11) on the right side of the linear equations. 
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+ H = 3 (column 1) 


3(column 1) + 1(column 2) = b 


column 2 
m H = column 1 


Figure 2.2: Column picture: A combination of columns produces the right side (1, 11). 


To repeat: The left side of the vector equation is a linear combination of the columns. 
The problem is to find the right coefficients x = 3 and y = 1. We are combining scalar 
multiplication and vector addition into one step. That step is crucially important, because 
it contains both of the basic operations: Multiply by 3 and 1, then add. 


f er 1 —2 1 
Linear combination 3 | 3 | + | 9 | = | 11 | i 


Of course the solution x = 3,y = 1 is the same as in the row picture. I don’t know 
which picture you prefer ! I suspect that the two intersecting lines are more familiar at first. 
You may like the row picture better, but only for one day. My own preference is to combine 
column vectors. It is a lot easier to see a combination of four vectors in four-dimensional 
space, than to visualize how four hyperplanes might possibly meet at a point. (Even one 
hyperplane is hard enough. . .) 

The coefficient matrix on the left side of the equations is the 2 by 2 matrix A: 


Coefficient matrix A= | : | . 


This is very typical of linear algebra, to look at a matrix by rows and by columns. Its rows 
give the row picture and its columns give the column picture. Same numbers, different 
pictures, same equations. We combine those equations into a matrix problem Az = b: 


Matrix equation AZ =2 zj | 1 
AS =E fiuj? 
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The row picture deals with the two rows of A. The column picture combines the columns. 
The numbers z = 3 and y = 1 go into æ. Here is matrix-vector multiplication: 


ilala 


Looking ahead This chapter is going to solve n equations in n unknowns (for any n). 
I am not going at top speed, because smaller systems allow examples and pictures and a 
complete understanding. You are free to go faster, as long as matrix multiplication and 
inversion become clear. Those two ideas will be the keys to invertible matrices. 

I can list four steps to understanding elimination using matrices. 


Dot products with rows i- “i | -2 
Combination of columns = x 2 


1. Elimination goes from A to a triangular U by a sequence of matrix steps E;;. 

2. The triangular system is solved by back substitution: working bottom to top. 
3. In matrix language A is factored into LU = (lower triangular) (upper triangular). 

4. Elimination succeeds if A is invertible. (But it may need row exchanges.) 


The most-used algorithm in computational science takes those steps (MATLAB calls it lu). 
Its quickest form is backslash: x = A \ b. But linear algebra goes beyond square invertible 
matrices! For m by n matrices, Ax = 0 may have many solutions. Those solutions will 
go into a vector space. The rank of A leads to the dimension of that vector space. 

All this comes in Chapter 3, and I don’t want to hurry. But I must get there. 


Three Equations in Three Unknowns 


The three unknowns are x, y, z. We have three linear equations: 


x + 2 + 32 = 6 
Az =b 2r + 5y + 2z = 4 (3) 
6r = 3y + z = 2 


We look for numbers x, y, z that solve all three equations at once. Those desired numbers 
might or might not exist. For this system, they do exist. When the number of unknowns 
matches the number of equations, in this case 3 = 3, there is usually one solution. 

Before solving the problem, we visualize it both ways: 


ROW The row picture shows three planes meeting at a single point. 


COLUMN The column picture combines three columns to produce b = (6, 4,2). 


In the row picture, each equation produces a plane in three-dimensional space. The first 
plane in Figure 2.3 comes from the first equation x + 2y + 3z = 6. That plane crosses the x 
and y and z axes at the points (6,0,0) and (0,3, 0) and (0, 0,2). Those three points solve 
the equation and they determine the whole plane. 
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The vector (x, y, z) = (0,0, 0) does not solve x + 2y + 3z = 6. Therefore that plane 
does not contain the origin. The plane x + 2y + 3z = 0 does pass through the origin, and it 
is parallel to x + 2y + 3z = 6. When the right side increases to 6, the parallel plane moves 
away from the origin. 

The second plane is given by the second equation 2x + 5y + 2z = 4. It intersects the 
first plane in a line L. The usual result of two equations in three unknowns is a line L of 
solutions. (Not if the equations were x + 2y + 3z = 6 and z + 2y + 3z = 0.) 

The third equation gives a third plane. It cuts the line L at a single point. That point 
lies on all three planes and it solves all three equations. It is harder to draw this triple 
intersection point than to imagine it. The three planes meet at the solution (which we 
haven’t found yet). The column form will now show immediately why z = 2. 


z z 
line L L 
0 
Solution | 0 
L L 2 
J . J 
plane x +2y +3z=6 3rd plane 6x — 3y +z =2 


plane 2x + 5y + 2z = 4 


(0,0, 2) is on all three planes 


Figure 2.3: Row picture: Two planes meet at a line L. Three planes meet at a point. 


The column picture starts with the vector form of the equations Ax = b: 


1 2 3 6 
Combine columns g| 2 | +y 5 +z] 2 |=| 4 | =b (4) 
6 —3 1 2 


The unknowns are the coefficients x, y, z. We want to multiply the three column vectors 
by the correct numbers x, y, z to produce b = (6, 4, 2). 

Figure 2.4 shows this column picture. Linear combinations of those columns can pro- 
duce any vector b! The combination that produces b = (6,4, 2) is just 2 times the third 
column. The coefficients we need are x = 0, y = 0, and z = 2. 

The three planes in the row picture meet at that same solution point (0, 0, 2): 


Correct combination 
(¢, 9,2) = (0,0, 2) 
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1 
2 | = column 1 
6 
3 
a 4 2 
1 5 | = column 2 
-3 


2 times column 3 is b = 


NA A 


Figure 2.4: Column picture: Combine the columns with weights (x,y,z) = (0,0, 2). 


The Matrix Form of the Equations 


We have three rows in the row picture and three columns in the column picture (plus the 
right side). The three rows and three columns contain nine numbers. These nine numbers 
fill a 3 by 3 matrix A: 


DRAS 
The “coefficient matrix” in Ax = bis A=|2 5 2 
6 =o. 1 


The capital letter A stands for all nine coefficients (in this square array). The letter 
b denotes the column vector with components 6,4, 2. The unknown g is also a column 
vector, with components æ, y, z. (We use boldface because it is a vector, x because it is 
unknown.) By rows the equations were (3), by columns they were (4), and by matrices they 
are (5): 


2 Hi 6 
Matrix equation Ax = b 2T RITE? y |=] 4 is (5) 
63.001 z 2 


Basic question: What does it mean to “multiply A times xz”? We can multiply by rows 
or by columns. Either way, Ax = b must be a correct statement of the three equations. 
You do the same nine multiplications either way. 


Multiplication by rows Az comes from dot products, each row times the column z: 


(6) 
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Multiplication by columns | Az is acombination of column vectors: 


Ax =x (column 1)+ y (column 2)+ z (column 3). (7) 


When we substitute the solution x = (0,0, 2), the multiplication Ax produces b: 


(ee. 0 6 
2 5 9 0 | = 2 times column 3 = | 4 
6 -3 1 2 2 


The dot product from the first row is (1,2,3) - (0,0,2) = 6. The other rows give dot 
products 4 and 2. This book sees Ax as a combination of the columns of A. 


Example 1 Here are 3 by 3 matrices A and J = identity, with three 1’s and six 0’s: 


1 0 0 4 4 1 0 0 4 4 
Az=|1 0 0 5 |=] 4 fe=)/0 1 0 o | =| 06 
1 0 0 6 4 0 0 1 6 6 


If you are a row person, the dot product of (1,0,0) with (4,5,6) is 4. If you are a column 
person, the linear combination Az is 4 times the first column (1, 1,1). In that matrix A, 
the second and third columns are zero vectors. 

The other matrix J is special. It has ones on the “main diagonal”. Whatever vector 
this matrix multiplies, that vector is not changed. This is like multiplication by 1, but for 
matrices and vectors. The exceptional matrix in this example is the 3 by 3 identity matrix : 


always yields the multiplication Iæ =x. 


N$ 

| 
O O m. 
O. © 
=. o o 


Matrix Notation 


The first row of a 2 by 2 matrix contains a, and a12. The second row contains a2; and 
a22. The first index gives the row number, so that a;; is an entry in row 7. The second index 
j gives the column number. But those subscripts are not very convenient on a keyboard! 
Instead of a;; we type A(i, j). The entry as7 = A(5,7) would be in row 5, column 7. 


a-fe lelien ae 


For an m by n matrix, the row index 2 goes from 1 to m. The column index 7 stops at n. 
There are mn entries a;; = A(i, j). A square matrix of order n has n? entries. 
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Multiplication in MATLAB 


I want to express A and æ and their product Ax using MATLAB commands. This is a first 
step in learning that language (and others). I begin by defining A and x. A vector x in R” 
is an n by 1 matrix (as in this book). Enter matrices a row at a time, and use a semicolon 
to signal the end of a row. Or enter by columns and transpose by ’ : 


A223 2.5 3, 6 =e 7] 
z=(0 0 2) or ¢=([0; 022) 


Here are three ways to multiply Aæ in MATLAB. In reality, A * æ is the good way to do it. 
MATLAB is a high level language, and it works with matrices: 


Matrix multiplication b= Axa 


We can also pick out the first row of A (as a smaller matrix !). The notation for that 
3 by 3 submatrix is A(1,:). Here the colon symbol : keeps all columns of row 1. 


Row ata time b=[A(1,:) * x; A(2,:) xx; A(3,:) *a] 


Each entry of b is a dot product, row times column, 1 by 3 matrix times 3 by 1 matrix. 

The other way to multiply uses the columns of A. The first column is the 3 by 1 
submatrix A(:,1). Now the colon symbol : comes first, to keep all rows of column 1. 
This column multiplies z(1) and the other columns multiply z(2) and x(3): 


Column at a time b= A(:,1)*x(1) + A(:,2) * 2(2) + A(:,3) * 2(3) 


I think that matrices are stored by columns. Then multiplying a column at a time will be a 
little faster. So A * æ is actually executed by columns. 


Programming Languages for Mathematics and Statistics 


Here are five more important languages and their commands for the multiplication Az : 


Julia Axr julialang.org 

Python dot(A, a) python.org 

R AH* hE r-project.org 

Mathematica A.2& wolfram.com/mathematica 
Maple Ae oe maplesoft.com 


Julia, Python, and R are free and open source languages. R is developed particularly for 
applications in statistics. Other software for statistics (SAS, JMP, and many more) 
is described on Wikipedia’s Comparison of Statistical Packages. 


Mathematica and Maple allow symbolic entries a,b,z,... and not only real numbers. 
As in MATLAB’s Symbolic Toolbox, they work with symbolic expressions like x? — zx. 
The power of Mathematica is seen in Wolfram Alpha. 
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Julia combines the high productivity of SciPy or R for technical computing with per- 
formance comparable to C or Fortran. It can call Python and C/Fortran libraries. But it 
doesn’t rely on “vectorized” library functions for speed; Julia is designed to be fast. 

I entered juliabox.org. I clicked Sign in via Google to access my gmail space. Then 
I clicked new at the right and chose a Julia notebook. I chose 0.4.5 and not one under 
development. The Julia command line came up immediately. 

As a novice, I computed 1 + 1. To see the answer I pressed Shift+Enter. I also 
learned that 1.0 + 1.0 uses floating point, much faster for a large problem. The website 
math.mit.edu/linearalgebra will show part of the power of Julia and Python and R. 


Python is a popular general-purpose programming language. When combined with 
packages like NumPy and the SciPy library, it provides a full-featured environment for 
technical computing. NumPy has the basic linear algebra commands. Download the Ana- 
conda Python distribution from https://www.continuum.io (a prepackaged collection of 
Python and most important mathematical libraries, with a graphical installer). 


R is free software for statistical computing and graphics. To download and install R, go 
to r-project.org (prefix https://www.). Commands are prompted by > and R is a scripted 
language. It works with lists that can be shaped into vectors and matrices. 

It is important to recommend RStudio for editing and graphing (and help resources). 
When you download from www.RStudio.com, a window opens for R commands—plus 
windows for editing and managing files and plots. Tell R the form of the matrix as well as 
the list of numerical entries: 

> A= matrix (¢(1, 2,3, 2,5, 2,6, -3,1), nrow = 3, byrow = TRUE) 

>x = matrix (¢(0,0,2), nrow =3) 

To see A and z, type their names at the new prompt >. To multiply type b = A% * % z. 
Transpose by t( A) and use as.matrix to turn a vector into a matrix. 


MATLAB. and Julia have a cleaner syntax for matrix computations than R. But R has 
become very familiar and widely used. The website for this book has space for proper 
demos (including the Manipulate command) of MATLAB and Julia and Python and R. 


= REVIEW OF THE KEY IDEAS = 


1. The basic operations on vectors are multiplication cv and vector addition v + w. 
2. Together those operations give linear combinations cv + dw. 


3. Matrix-vector multiplication Ax can be computed by dot products, a row at a time. 
But Ax must be understood as a combination of the columns of A. 


4. Column picture: Ax = b asks for a combination of columns to produce b. 


5. Row picture: Each equation in Ax = b gives a line (n = 2) or a plane (n = 3) 
or a “hyperplane” (n > 3). They intersect at the solution or solutions, if any. 
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™ WORKED EXAMPLES = 


2.1A Describe the column picture of these three equations Ax = b. Solve by careful 
inspection of the columns (instead of elimination): 
e+ dy + 2z = -—3 Loa 2 ee —3 
20+ 2y+2z=-—-2 whichis 2 2 2| |y| = |-2 
3x + Oy + 62 = —5 a @ 6) |2 —5 
Solution The column picture asks for a linear combination that produces b from the 
three columns of A. In this example b is minus the second column. So the solution is 
x = 0,y = —1,z = 0. To show that (0, —1, 0) is the only solution we have to know that 
“A is invertible” and “the columns are independent” and “the determinant isn’t zero.” 
Those words are not yet defined but the test comes from elimination: We need 
(and for this matrix we find) a full set of three nonzero pivots. 
Suppose the right side changes to b = (4, 4,8) = sum of the first two columns. Then 
the good combination has z = 1, y = 1, z = 0. The solution becomes x = (1, 1,0). 


2.1B This system has no solution. The planes in the row picture don’t meet at a point. 
_ No combination of the three columns produces b. How to show this ? 


z+3y+5z=4 1 3 5 z 4 
r +2y—3z=5 1 2 -3 y |=|5]|=b 
22+ 5y+2z=8 2 5 2 ž 8 


ldea Add (equation 1) + (equation 2) — (equation 3). The result is O = 1. This system 
cannot have a solution. We could say: The vector (1,1,—1) is orthogonal to all three 
columns of A but not orthogonal to b. 
(1) Are any two of the three planes parallel? What are the equations of planes parallel to 
z+ dy + 5z = 4? 
(2) Take the dot product of each column of A (and also b) with y = (1,1,-1). 
How do those dot products show that no combination of columns equals b? 


(3) Find three different right side vectors b* and b** and b*** that do allow solutions. 


Solution 
(1) The planes don’t meet at a point, even though no two planes are parallel. For a plane 
parallel to x + 3y + 5z = 4, change the “4”. The parallel plane x + 3y + 5z = 0 
goes through the origin (0,0,0). And the equation multiplied by any nonzero con- 
stant still gives the same plane, as in 2x + 6y + 10z = 8. 


(2) The dot product of each column of A with y = (1, 1, —1) is zero. On the right side, 
y-b = (1,1,-1)- (4,5,8) = 1 is not zero. Ax = b led to 0 = 1: no solution. 


(3) There is a solution when b is a combination of the columns. These three choices of 
b have solutions including a* = (1,0,0) and x** = (1,1,1) and x*** = (0,0,0): 


1 9 0 
b = |1| = firstcolumn 6b * = |0| = sumofcolumns b*** = | 0 
2 9 0 
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Problem Set 2.1 


Problems 1-8 are about the row and column pictures of Az = b. 


1 With A = I (the identity matrix) draw the planes in the row picture. Three sides of 
a box meet at the solution x = (x,y,z) = (2,3, 4): 


lz + O0y+0z=2 1 0 0| |z 2 
Or+ly+0z=3 or 0 1 Ol ly] = 1s 
Oz +0y+1z= 4 0 0- I] |z 4 


Draw the vectors in the column picture. Two times column 1 plus three times column 
2 plus four times column 3 equals the right side b. 


2 If the equations in Problem 1 are multiplied by 2, 3, 4 they become DX = B: 


2x + 0y + 0z = 4 2 OON 4 
Or + 3y +0z=9 or DX =]|0 3 0| Ly | S| 9) =B 
Or + Oy + 4z = 16 i O A Le 16 


Why is the row picture the same? Is the solution X the same as x? What is changed 
in the column picture—the columns or the right combination to give B? 


3 If equation 1 is added to equation 2, which of these are changed: the planes in the 
row picture, the vectors in the column picture, the coefficient matrix, the solution? 
The new equations in Problem 1 would be x = 2, x + y=5,z = 4. 


4 Find a point with z = 2 on the intersection line of the planes z + y + 3z = 6 and 
x — y + z = 4. Find the point with z = 0. Find a third point halfway between. 


5 The first of these equations plus the second equals the third: 


tr yr e=2 
e+ 2y+ 2=3 
2z + 3y+ 2z = 5. 
The first two planes meet along a line. The third plane contains that line, because 


if x, y, z satisfy the first two equations then they also . The equations have 
infinitely many solutions (the whole line L). Find three solutions on L. 


6 Move the third plane in Problem 5 to a parallel plane 27 + 3y + 2z = 9. Now the 
three equations have no solution—why not? The first two planes meet along the line 
L, but the third plane doesn’t that line. 


7 In Problem 5 the columns are (1, 1,2) and (1, 2,3) and (1, 1,2). This is a “singular 
case” because the third column is . Find two combinations of the columns that 
give b = (2, 3,5). This is only possible for b = (4,6, c) if c = 
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Normally 4 “planes” in 4-dimensional space meet ata ____. Normally 4 col- 
umn vectors in 4-dimensional space can combine to produce b. What combination 
of (1,0,0,0), (1,1,0,0), (1,1,1,0), (1,1,1,1) produces b = (3,3,3,2)? What 4 
equations for z, y, z, t are you solving? 


Problems 9-14 are about multiplying matrices and vectors. 


9 


10 


11 


12 


13 


14 


Compute each Ax by dot products of the rows with the column vector: 


rea a 

@, en a ~o) 
ace ol la Cds D1) ied 
001 2} [2 


Compute each Az in Problem 9 as a combination of the columns: 


1 2 4 
9(a) becomes Aw = 2 |—2) 42/3/43 ]1] = 
—4 1 2 


How many separate multiplications for Ax, when the matrix is “3 by 3”? 


Find the two components of Aa by rows or by columns: 


Fel AJE Sap 


Multiply A times x to find three components of Ax: 


0 0 (Gh leas 2 1 3 1 2 1 1 
0O 1 O] ly and 1 2 3 1 and 1 2 H : 
1 0 0J |z 3 3 6] [- 3.3 
(a) A matrix with m rows and n columns multiplies a vector with compo- 
nents to produce a vector with ____ components. 


(b) The planes from the m equations Ax = b are in -dimensional space. 
The combination of the columns of A is in -dimensional space. 


Write 27 + 3y + z+ 5t = 8 as a matrix A (how many rows?) multiplying the column 
vector x = (x,y, z,t) to produce b. The solutions æ fill a plane or “hyperplane” 
in 4-dimensional space. The plane is 3-dimensional with no 4D volume. 


Problems 15-22 ask for matrices that act in special ways on vectors. 


15 


(a) What is the 2 by 2 identity matrix? J times | ¥ | equals [¥]. 
(b) What is the 2 by 2 exchange matrix? P times Ea equals | : 
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16 


17 


18 


19 


20 


21 


22 


23 


24 


25 


x 


(a) What 2 by 2 matrix R rotates every vector by 90°? R times Fa is Ed : 


(b) What 2 by 2 matrix R? rotates every vector by 180°? 


Find the matrix P that multiplies (x, y, z) to give (y,z,x). Find the matrix Q that 
multiplies (y, z, x) to bring back (zx, y, z). 


What 2 by 2 matrix E subtracts the first component from the second component? 
What 3 by 3 matrix does the same? 


3 3 
E H = a and E |5ļ|=]2 
7 T 


What 3 by 3 matrix E multiplies (x, y, z) to give (x,y,z + x)? What matrix E~! 
multiplies (x,y,z) to give (x,y,z — x)? If you multiply (3,4,5) by Æ and then 
multiply by ÆT t, the two results are ( )and(___). 


What 2 by 2 matrix P, projects the vector (x, y) onto the x axis to produce (x, 0)? 
What matrix P> projects onto the y axis to produce (0,y)? If you multiply (5,7) 
by Pı and then multiply by P2, you get ( ) and ( ). 


What 2 by 2 matrix R rotates every vector through 45°? The vector (1,0) goes to 
(\/2/2, /2/2). The vector (0,1) goes to (—/2/2, 2/2). Those determine the 
matrix. Draw these particular vectors in the xy plane and find R. 


Write the dot product of (1, 4,5) and (x,y,z) as a matrix multiplication Aæ. The 
matrix A has one row. The solutions to Aæ = 0 lie on a perpendicular to the 
vector . The columns of A are only in -dimensional space. 


In MATLAB notation, write the commands that define this matrix A and the column 
vectors x and b. What command would test whether or not Ax = b? 


ehg =E] i 


The MATLAB commands A = eye(3) and v = [3:5] produce the 3 by 3 identity 
matrix and the column vector (3,4,5). What are the outputs from Axv and v/ xv? 
(Computer not needed!) If you ask for vxA, what happens? 


If you multiply the 4 by 4 all-ones matrix A = ones(4) and the column v = ones(4,1), 
what is Axv? (Computer not needed.) If you multiply B = eye(4) + ones(4) times 
w = zeros(4,1) + 2*ones(4,1), what is Bkw? 
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Questions 26-28 review the row and column pictures in 2, 3, and 4 dimensions. 


26 Draw the row and column pictures for the equations x — 2y = 0,7 + y = ô. 


27 Fortwo linear equations in three unknowns 2, y, z, the row picture will show (2 or 3) 
(lines or planes) in (2 or 3)-dimensional space. The column picture is in (2 or 3)- 
dimensional space. The solutions normally lieona __ 


28 Forfour linear equations in two unknowns z and y, the row picture shows four 
The column picture is in -dimensional space. The equations have no solution 
unless the vector on the right side is a combination of 


29 Start with the vector w = (1,0). Multiply again and again by the same “Markov 
matrix” A = [.8 .3; .2 ap The next three vectors are u1, U2, U3: 


8: .3| V1 8 
ui = L 5) n = B U2 = Au} = U3 = Aug = $ 


What property do you notice for all four vectors uo, U1, U2, U3? 
Challenge Problems 
30 Continue Problem 29 from uo = (1,0) to u7, and also from vo = (0,1) to v7. 


What do you notice about uy and v7? Here are two MATLAB codes, with while and i 
for. They plot wo to uy and vo to v7. You can use other languages: | 


u=[1;0];A=[.8 .3 ;.2.7]; v=[(0;1];A=[.8 .3; .2.7]; \ 
x=u;k=[0: 7]; x=vik =[0: 7]; | 
while size(x,2) <=7 for j=1:7 

u = Axu; x = [x u]; v = A*v; x = [x v]; 
end end 
plot(k, x) plot(k, x) 


The w’s and v’s are approaching a steady state vector s. Guess that vector and check 
that As = s. If you start with s, you stay with s. 


31 Invent a 3 by 3 magic matrix M3 with entries 1,2,...,9. All rows and columns | 
and diagonals add to 15. The first row could be 8, 3, 4. What is M3 times (1, 1, 1)? | 
What is M4 times (1, 1, 1, 1) if a 4 by 4 magic matrix has entries 1,... , 16? 


32 Suppose u and v are the first two columns of a 3 by 3 matrix A. Which third columns | 
w would make this matrix singular? Describe a typical column picture of Ax = b 
in that singular case, and a typical row picture (for a random b). 
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Multiplication by A is a “linear transformation”. Those words mean: 


If w is a combination of u and v, then Aw is the same combination of Au and Av. 


It is this “linearity” Aw = cAu + dAv that gives us the name “linear algebra”. 


Problem: If u = | : | and v = A 


1 | then Au and Av are the columns of A. 


Combine w = cu + dv. If w = | i | how is Aw connected to Au and Av ? 


Start from the four equations —z;i+1 + 2z; — 2-1 = ti (fori = 1,2,3,4 with 
zo = zs = 0). Write those equations in their matrix form Aæ = b. Can you solve 
them for 21, £2, £3, £4 ? 


A 9 by 9 Sudoku matrix S has the numbers 1, . . . , 9 in every row and every column, 
and in every 3 by 3 block. For the all-ones vector æ = (1,...,1), whatis Sx? 


A better question is: Which row exchanges will produce another Sudoku matrix? 
Also, which exchanges of block rows give another Sudoku matrix? 


Section 2.7 will look at all possible permutations (reorderings) of the rows. I can see 
6 orders for the first 3 rows, all giving Sudoku matrices. Also 6 permutations of the 
next 3 rows, and of the last 3 rows. And 6 block permutations of the block rows? 
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2.2 The Idea of Elimination 


1 Form = n = 3, there are three equations Ax = b and three unknowns z1, £2, £3. \ 


2 The first two equations are a11£ı ++- = bı andagi%, +-+- = b2. 

3 Multiply the first equation by a2; /aıı and subtract from the second: then zı is eliminated. 
4 The corner entry a}, is the first “pivot” and the ratio a2; /a11 is the first “multiplier.” 

5 Eliminate zı from every remaining equation i by subtracting a;; /a1, times the first equation. 


6 Now the last n — 1 equations contain n — 1 unknowns 2, ..., Zn. Repeat to eliminate x2. 


7 Elimination breaks down if zero appears in the pivot. Exchanging two equations may save it. / 


This chapter explains a systematic way to solve linear equations. The method is called 
“elimination”, and you can see it immediately in our 2 by 2 example. Before elimination, 
x and y appear in both equations. After elimination, the first unknown z has disappeared 
from the second equation 8y = 8: 


x-2y=1 = | (multiply equation 1 by 3) 
Before 3x+2y=11 After =$ (subtract to eliminate 3x) 


The new equation 8y = 8 instantly gives y = 1. Substituting y = 1 back into the first 
equation leaves x — 2 = 1. Therefore z = 3 and the solution (z, y) = (3,1) is complete. 


Elimination produces an upper triangular system—this is the goal. The nonzero 
coefficients 1, —2, 8 form a triangle. That system is solved from the bottom upwards— 
first y = 1 and then z = 3. This quick process is called back substitution. It is used for 
upper triangular systems of any size, after elimination gives a triangle. 

Important point: The original equations have the same solution x = 3 and y = 1. 
Figure 2.5 shows each system as a pair of lines, intersecting at the solution point (3, 1).. 
After elimination, the lines still meet at the same point. Every step worked with correct 
equations. 


How did we get from the first pair of lines to the second pair? We subtracted 3 times 
the first equation from the second equation. The step that eliminates x from equation 2 is 
the fundamental operation in this chapter. We use it so often that we look at it closely: 


To eliminate x: Subtract a multiple of equation 1 from equation 2. 
Three times xz — 2y = 1 gives 3a — 6y = 3. When this is subtracted from 3z + 2y = 11, 


the right side becomes 8. The main point is that 3x cancels 3x. What remains on the left 
side is 2y — (—6y) or 8y, and z is eliminated. The system became triangular. 
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Ask yourself how that multiplier 2 = 3 was found. The first equation contains 1z. 
So the first pivot was 1 (the coefficient of x). The second equation contains 3, so the 
multiplier was 3. Then subtraction 3x — 3x produced the zero and the triangle. 

You will see the multiplier rule if I change the first equation to 4x — 8y = 4. (Same 
straight line but the first pivot becomes 4.) The correct multiplier is now ¢ = 3 To find the 
multiplier, divide the coefficient “ 3” to be eliminated by the pivot “ 4”: 


4x-8y=4 Multiply equation 1 by 3 4x — 8y|= 4 
3x +2y = 11 Subtract from equation 2 8y|= 8. 


The final system is triangular and the last equation still gives y = 1. Back substitution 
produces 4x — 8 = 4 and 4z = 12 and x = 3. We changed the numbers but not the lines 
or the solution. Divide by the pivot to find that multiplier £ = 3: 


Pivot = first nonzero in the row that does the elimination 
Multiplier = (entry to eliminate) divided by (pivot) = 3. 


The new second equation starts with the second pivot, which is 8. We would use it to 
eliminate y from the third equation if there were one. To solve n equations we want n 
pivots. The pivots are on the diagonal of the triangle after elimination. 

You could have solved those equations for x and y without reading this book. It is an 
extremely humble problem, but we stay with it a little longer. Even for a 2 by 2 system, 
elimination might break down. By understanding the possible breakdown (when we can’t 
find a full set of pivots), you will understand the whole process of elimination. 


J X 


After elimination 


3x + 2y = 11 
Before elimination 


x—2y=1 


Figure 2.5: Eliminating x makes the second line horizontal. Then 8y = 8 gives y = 1. 


Breakdown of Elimination 


Normally, elimination produces the pivots that take us to the solution. But failure is possi- 
ble. At some point, the method might ask us to divide by zero. We can’t do it. The process 
has to stop. There might be a way to adjust and continue—or failure may be unavoidable. 


Example 1 fails with no solution to Oy = 8. Example 2 fails with too many solutions to 
Oy = 0. Example 3 succeeds by exchanging the equations. 
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column 


y first | 1 | 
3 


Columns don’t combine to give b = F l 


3x- 6y=11 
Figure 2.6: Row picture and column picture for Example 1: no solution. 


Example 1 Permanent failure with no solution. Elimination makes this clear: 


zx—-2y=1 Subtract 3 times x—2y=1 
3x — 6y = 11 eqn. 1 from eqn. 2 Oy = 8. 


There is no solution to Oy = 8. Normally we divide the right side 8 by the second pivot, 
but this system has no second pivot. (Zero is never allowed as a pivot!) The row and 
column pictures in Figure 2.6 show why failure was unavoidable. If there is no solution, 
elimination will discover that fact by reaching an equation like Oy = 8. 

The row picture of failure shows parallel lines—which never meet. A solution must lie 
on both lines. With no meeting point, the equations have no solution. 

The column picture shows the two columns (1,3) and (—2, —6) in the same direction. 
All combinations of the columns lie along a line. But the column from the right side is in 
a different direction (1, 11). No combination of the columns can produce this right side— 
therefore no solution. 

When we change the right side to (1, 3), failure shows as a whole line of solution points. 
Instead of no solution, next comes Example 2 with infinitely many. 


Example 2 Failure with infinitely many solutions. Change b = (1, 11) to (1,3). 


t—2y=1 Subtract 3 times a—2y= il Still only 
32 —6y=3 eqn. 1 from eqn. 2 Oy = 0. one pivot. 


Every y satisfies 0y = 0. There is really only one equation x — 2y = 1. The unknown y is 
“free”. After y is freely chosen, x is determined as x = 1 + 2y. 

In the row picture, the parallel lines have become the same line. Every point on that 
line satisfies both equations. We have a whole line of solutions in Figure 2.7. 

In the column picture, b = (1,3) is now the same as column 1. So we can choose x = 1 
and y = 0. We can also choose x = 0 and y = —3; column 2 times -4 equals b. Every 
(x, y) that solves the row problem also solves the column problem. 
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1 
right hand side B 


lies on the line of columns 


Same line from both equations 
Solutions all along this line 


1 -iji 
z (second column) = — | l 


Figure 2.7: Row and column pictures for Example 2: infinitely many solutions. 


Failure For n equations we do not get n pivots 
Elimination leads to an equation 0 Æ 0 (no solution) or 0 = O (many solutions) 


Success comes with n pivots. But we may have to exchange the n equations. 


Elimination can go wrong in a third way—but this time it can be fixed. Suppose the first 
pivot position contains zero. We refuse to allow zero as a pivot. When the first equation 
has no term involving x, we can exchange it with an equation below: 


Example 3 Temporary failure (zero in pivot). A row exchange produces two pivots: 


Or+2y=4 Exchange the 3z —2y=5 


Permutation i 
32 — 2y=5 two equations Dy =A 


The new system is already triangular. This small example is ready for back substitution. 
The last equation gives y = 2, and then the first equation gives x = 3. The row picture is 
normal (two intersecting lines). The column picture is also normal (column vectors not in 
the same direction). The pivots 3 and 2 are normal—but a row exchange was required. 
Examples 1 and 2 are singular—there is no second pivot. Example 3 is nonsingular— 
there is a full set of pivots and exactly one solution. Singular equations have no solution or 
infinitely many solutions. Pivots must be nonzero because we have to divide by them. 


Three Equations in Three Unknowns 


To understand Gaussian elimination, you have to go beyond 2 by 2 systems. Three by three 
is enough to see the pattern. For now the matrices are square—an equal number of rows 
and columns. Here is a 3 by 3 system, specially constructed so that all elimination steps 
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lead to whole numbers and not fractions: 
22 --4y —22 = 2 
Ay-+-9y — 3z = 8 (1) 
—2x — 3y + fz = 10 
What are the steps? The first pivot is the boldface 2 (upper left). Below that pivot we want 


to eliminate the 4. The first multiplier is the ratio 4/2 = 2. Multiply the pivot equation by 
21 = 2 and subtract. Subtraction removes the 4x from the second equation: 


Step 1 Subtract 2 times equation 1 from equation 2. This leaves y + z = 4. 


We also eliminate —2z from equation 3—still using the first pivot. The quick way is to add 
equation 1 to equation 3. Then 2z cancels —2x. We do exactly that, but the rule in this book 
is to subtract rather than add. The systematic pattern has multiplier 23; = —2/2 = —1. 
Subtracting —1 times an equation is the same as adding: 


Step 2 Subtract —1 times equation 1 from equation 3. This leaves y + 5z = 12. 
The two new equations involve only y and z. The second pivot (in boldface) is 1: 
EE ly+1z=4 
x is eliminated iy oe = a0 


We have reached a 2 by 2 system. The final step eliminates y to make it 1 by 1: 


Step 3 Subtract equation 2new from 3new. The multiplier is 1/1 = 1. Then 4z = 8. 


The original Ax = b has been converted into an upper triangular Ux = c: 


22 + 4y —2z=2 Ax =b 22 +4y—2z=2 


4x + 9y —3z=8 has become ly+i1z=4 (2) 
—2r —3y+7z = 10 Urz=c Az = 8. 


The goal is achieved—forward elimination is complete from A to U. Notice the pivots 
2,1, 4 along the diagonal of U. The pivots 1 and 4 were hidden in the original system. 
Elimination brought them out. Ux = c is ready for back substitution, which is quick: 


(4z=8 gives z=2) (yt+z=4 gives y=2) (equation 1 gives z= —1) 


The solution is (x,y,z) = (—1, 2,2). The row picture has three planes from three equa- 
tions. All the planes go through this solution. The original planes are sloping, but the last 
plane 4z = 8 after elimination is horizontal. 

The column picture shows a combination Aæ of column vectors producing the right 
side b. The coefficients in that combination are —1, 2, 2 (the solution): 


2 4 —2 2 
Ag = (—1) | 4| +2| 9| 4+42)/=—8| equals | 8| =b. (3) 
—2 —3 T 10 


The numbers z, y, z multiply columns 1, 2, 3 in Ax = b and also in the triangular Ux = c. 
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Elimination from A to U 


For a 4 by 4 problem, or an n by n problem, elimination proceeds in the same way. Here is 
the whole idea, column by column from A to U, when Gaussian elimination succeeds. 


Column 1. Use the first equation to create zeros below the first pivot. 
Column 2. Use the new equation 2 to create zeros below the second pivot. 
Columns 3 to n. Keep going to find all n pivots and the upper triangular U. 


L T 
After column 2 we have We want X 


(4) 


Sog s 
8 8 8 & 
Sop SS? oR ey 
8 8 8 
8 8 8 8 


x 
0 
0 
0 


The result of forward elimination is an upper triangular system. It is nonsingular if there 
is a full set of n pivots (never zero!). Question: Which z on the left won’t be changed 
in elimination because the pivot is known? Here is a final example to show the original 
Az = b, the triangular system Ua = c, and the solution (z, y, z) from back substitution: 


La yt 2 =O La ys 2=6 ii 3 Back 
x+2y+2z=9 Forward ae = ə yl =12 Back 
x+2y+3z=10 Forward g=1 2 1 


All multipliers are 1. All pivots are 1. All planes meet at the solution (3, 2, 1). The columns 
of A combine with 3, 2, 1 to give b = (6,9, 10). The triangle shows Ux = c = (6,3, 1). 


= REVIEW OF THE KEY IDEAS = 


1. A linear system (Ax = b) becomes upper triangular (Ua = c) after elimination. 


2. We subtract £;; times equation j from equation 7, to make the (i, 7) entry zero. 


3. The multiplier is 2;; = entry to eliminate trow i 


. Pivots can not be zero! 
4. When zero is in the pivot position, exchange rows if there is a nonzero below it. 
5. The upper triangular Ux = cis solved by back substitution (starting at the bottom). 


6. When breakdown is permanent, Aæ = b has no solution or infinitely many. 
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™ WORKED EXAMPLES =" 


2.2A When elimination is applied to this matrix A, what are the first and second pivots? 
What is the multiplier £2; in the first step (£21 times row 1 is subtracted from row 2)? 


PL) EoI 
A=]|1 2 1| — |0 1 
Oo T2 


What entry in the 2, 2 position (instead of 2) would force an exchange of rows 2 and 3? 
Why is the lower left multiplier £31 = 0, subtracting zero times row 1 from row 3? 
If you change the corner entry from a33 = 2 to a33 = 1, why does elimination fail? 


Solution The first pivot is 1. The multiplier £21 is 1, 1. When 1 times row 1 is subtracted 
from row 2, the second pivot is revealed as another 1. If the original middle entry had been 
1 instead of 2, that would have forced a row exchange. 

The multiplier 23; is zero because a3; = 0. A zero at the start of a row needs no 
elimination. This A is a “band matrix’. Everything stays zero outside the band. 

The last pivot is also 1. So if the original corner entry a33 = 2 reduced by 1, elimination 
would produce 0. No third pivot, elimination fails. 


2.2B Suppose A is already a triangular matrix (upper triangular or lower triangular). 
Where do you see its pivots? When does Az = b have exactly one solution for every b? 


Solution The pivots of a triangular matrix are already set along the main diagonal. Elim- 
ination succeeds when all those numbers are nonzero. Use back substitution when A is 
upper triangular, go forward when A is lower triangular. 


2.2C Use elimination to reach upper triangular matrices U. Solve by back substitution 
or explain why this is impossible. What are the pivots (never zero)? Exchange equations 
when necessary. The only difference is the —z in the last equation. 


Success Ltytz=T7 Failure Cty ret 
LS —2=5 Ly —2 = 5 
Z—-ytz=3 =—f=YrZz=5 


Solution For the first system, subtract equation 1 from equations 2 and 3 (the multipliers 
are £2; = 1 and £31 = 1). The 2, 2 entry becomes zero, so exchange equations 2 and 3: 


r+y+z= 7 r+y+z= 7 
Success Oy — 2z = —2  exchangesinto —2y + 0z = —4 
—2y + 0z = —4 —2z = -2 


Then back substitution gives z = 1 and y = 2 and x = 4. The pivots are 1, —2, —2. 
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For the second system, subtract equation 1 from equation 2 as before. Add equation 1 
to equation 3. This leaves zero in the 2, 2 entry and also below: 


La Ue í There is no pivot in column 2 (it was — column 1) 
Failure Oy — 2z = -2 A further elimination step gives 0z = 8 
Oy + 2z = 10 The three planes don’t meet 


Plane 1 meets plane 2 in a line. Plane 1 meets plane 3 in a parallel line. No solution. 
If we change the “3” in the original third equation to “—5” then elimination would lead 
to 0 = 0. There are infinitely many solutions! The three planes now meet along a whole line. 
Changing 3 to —5 moved the third plane to meet the other two. The second equation 
gives z = 1. Then the first equation leaves x + y = 6. No pivot in column 2 makes y free 
(free variables can have any value). Then z = 6 — y. 


Problem Set 2.2 


Problems 1-10 are about elimination on 2 by 2 systems. 


1 What multiple £2; of equation 1 should be subtracted from equation 2? 
22+ 3y=1 
10z + 9y = 11. 


After elimination, write down the upper triangular system and circle the two pivots. 
The numbers 1 and 11 don’t affect the pivots—use them now in back substitution. 


2 Solve the triangular system of Problem 1 by back substitution, y before x. Verify 
that x times (2,10) plus y times (3,9) equals (1,11). If the right side changes to 
(4,44), what is the new solution? 


3 What multiple of equation 1 should be subtracted from equation 2? 
22 —4y =6 
—x + 5y = 0. 


After this elimination step, solve the triangular system. If the right side changes to 
(—6, 0), what is the new solution? 


4 What multiple Z of equation 1 should be subtracted from equation 2 to remove c? 
ax + by = f 
cx + dy =g. 


The first pivot is a (assumed nonzero). Elimination produces what formula for the 
second pivot ? What is y ? The second pivot is missing when ad = bc: singular. 
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Choose a right side which gives no solution and another right side which gives 
infinitely many solutions. What are two of those solutions? 


3x + 2y = 10 


Singular system 6sty = 


Choose a coefficient b that makes this system singular. Then choose a right side g 
that makes it solvable. Find two solutions in that singular case. 


2x + by = 16 
4x + 8y =g. 


For which numbers a does elimination break down (1) permanently (2) temporarily? 


ax + 3y = —3 
4z +6y= 6. 


Solve for x and y after fixing the temporary breakdown by a row exchange. 


For which three numbers k does elimination break down? Which is fixed by a row 
exchange? In each case, is the number of solutions 0 or 1 or 00? 


kx+3y= 6 
3x + ky = —6. 


What test on bı and bz decides whether these two equations allow a solution? How 
many solutions will they have? Draw the column picture for b = (1,2) and (1,0). 


3x — 2y = by 
6x — 4y = bə. 


In the xy plane, draw the lines x+y = 5 and xr+2y = 6 and the equationy = _—_— 
that comes from elimination. The line 5x — 4y = c will go through the solution of 
these equations if c = 


Problems 11-20 study elimination on 3 by 3 systems (and possible failure). 


11 


(Recommended) A system of linear equations can’t have exactly two solutions. Why ? 


(a) If (x,y,z) and (X,Y, Z) are two solutions, what is another solution? 


(b) If 25 planes meet at two points, where else do they meet? 
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13 


14 


15 


16 


17 


18 


Reduce this system to upper triangular form by two row operations: 


2r +3y+ z= 8 
4x + Ty + 5z = 20 
— 2y +2z = 0. 


Circle the pivots. Solve by back substitution for z, y, £. 
Apply elimination (circle the pivots) and back substitution to solve 
2x — 3y =3 


4z — 5y + or 
2r— y—3z=5. 


List the three row operations: Subtract _____ times row from row 


Which number d forces a row exchange, and what is the triangular system (not sin- 
gular) for that d? Which d makes this system singular (no third pivot)? 


26 + Sy +2 = 0 
4r +dy+z=2 
y—=z=3. 


Which number b leads later to a row exchange? Which b leads to a missing pivot? In 
that singular case find a nonzero solution x, y, Z. 


x + by =0 
zr—2y—z=0 
y+z=0. 


(a) Construct a 3 by 3 system that needs two row exchanges to reach a triangular 
form and a solution. 


(b) Construct a 3 by 3 system that needs a row exchange to keep going, but breaks 
down later. 


If rows 1 and 2 are the same, how far can you get with elimination (allowing row 
exchange)? If columns 1 and 2 are the same, which pivot is missing? 


Equal 2r— y+z=0 2r +2y+z=0 Equal 
rows 2r—y+z=0 4x +4y+z=0 columns 
4zr+y+z=2 6x + 6y +2 =2. 


Construct a 3 by 3 example that has 9 different coefficients on the left side, but 
rows 2 and 3 become zero in elimination. How many solutions to your system with 
b = (1,10, 100) and how many with b = (0,0,0)? 
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Which number q makes this system singular and which right side t gives it infinitely 
many solutions? Find the solution that has z = 1. 


e+4y—2z=1 
x+7y—6z=6 
3y + qz =t. 


Three planes can fail to have an intersection point, even if no planes are parallel. 
The system is singular if row 3 of Aisa _____ of the first two rows. Find a third 
equation that can’t be solved together with x + y + z = 0 and z — 2y —- z = 1. 


Find the pivots and the solution for both systems (Ax = b and Ka = b): 


2r+ y =0 2r— y =0 
zr +2y+ z =0 —z + 2y— z =0 
y +2z+ t=0 — y+2z— t=0 
z+2t=5 — z+2t=5. 


If you extend Problem 21 following the 1, 2,1 pattern or the —1, 2, —1 pattern, what 
is the fifth pivot? What is the nth pivot? K is my favorite matrix. 


If elimination leads to x + y = 1 and 2y = 3, find three possible original problems. 
For which two numbers a will elimination fail on A = fa 2 | ? 


aa 


For which three numbers a will elimination fail to give three pivots? 


a 2 3 
A= | aa 4} is singular for three values of a. 
aa a 


Look for a matrix that has row sums 4 and 8, and column sums 2 and s: 


| a+b=4 a+c=2 


: a 
Mavi = |? d c+d=8 b+d=s 


The four equations are solvable only if s = . Then find two different matrices 
that have the correct row and column sums. Extra credit: Write down the 4 by 4 
system Ax = b with x = (a, b,c, d) and make A triangular by elimination. 


Elimination in the usual order gives what matrix U and what solution to this “lower 
triangular” system? We are really solving by forward substitution: 


3x =3 
6x + 2y =8 
9x — 2y +z = 9. 
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28 


29 


30 


31 


32 


Create a MATLAB command A(2,: ) =... for the new row 2, to subtract 3 times row 
1 from the existing row 2 if the matrix A is already known. 


Challenge Problems 


Find experimentally the average Ist and 2nd and 3rd pivot sizes from MATLAB ’s 
[L, U] = lu (rand (3)). The average size abs (U (1, 1)) is above 4 because lu picks 
the largest available pivot in column 1. Here A = rand (3) has random entries 
between 0 and 1. 


If the last corner entry is A(5,5) = 11 and the last pivot of A is U(5,5) = 4, what 
different entry A(5,5) would have made A singular? 


Suppose elimination takes A to U without row exchanges. Then row j of U is a 
combination of which rows of A? If Ax = 0, is Ux = 0? If Ax = b, is Ux = b? 
If A starts out lower triangular, what is the upper triangular U? 


Start with 100 equations Ax = 0 for 100 unknowns x = (21,..., 2100). Suppose 
elimination reduces the 100th equation to 0 = 0, so the system is “singular”. 


(a) Elimination takes linear combinations of the rows. So this singular system has 
the singular property: Some linear combination of the 100 rowsis ____ 


(b) Singular systems Ax = 0 have infinitely many solutions. This means that some 
linear combination of the 100 columns is 


(c) Invent a 100 by 100 singular matrix with no zero entries. 


(d) For your matrix, describe in words the row picture and the column picture of 
Ax = 0. Not necessary to draw 100-dimensional space. 
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2.3 Elimination Using Matrices 


1 The first step multiplies the equations Ax = b by a matrix F2; to produce E21 Ax = Enb. 
2 That matrix £2; A has a zero in row 2, column 1 because zı is eliminated from equation 2. 
3 Evə; is the identity matrix (diagonal of 1’s) minus the multiplier a2; /a,, in row 2, column 1. 


4 Matrix-matrix multiplication is n matrix-vector multiplications: EA = |[Ea, ... Ean]. 


5 We must also multiply £b! So E is multiplying the augmented matrix [A b] = [a, ... a, b). 


6 Elimination multiplies Ax = b by E21, F31,..., Eni, then E32, E42, ... , Eng, and onward. 


7 The row exchange matrix is not /;; but P;;. To find P,;;, exchange rows 7 and j of J. 


This section gives our first examples of matrix multiplication. Naturally we start with 
matrices that contain many zeros. Our goal is to see that matrices do something. E acts on 
a vector b or a matrix A to produce a new vector Eb or a new matrix EA. 

Our first examples will be “elimination matrices.” They execute the elimination steps. 
Multiply the jt” equation by lij and subtract from the itè equation. (This eliminates 
x; from equation 7.) We need a lot of these simple matrices Fij, one for every nonzero 
to be eliminated below the main diagonal. 

Fortunately we won’t see all these matrices /;; in later chapters. They are good exam- 
ples to start with, but there are too many. They can combine into one overall matrix E that 
takes all steps at once. The neatest way is to combine all their inverses (E;;)~! into one 
overall matrix L = E~!. Here is the purpose of the next pages. 


1. To see how each step is a matrix multiplication. 
2. To assemble all those steps £;; into one elimination matrix FE. 


3. To see how each E;; is inverted by its inverse matrix Ee. 
4. To assemble all those inverses By (in the right order) into L. 


The special property of L is that all the multipliers 2;; fall into place. Those numbers 
are mixed up in E (forward elimination from A to U). They are perfect in L (undoing 
elimination, returning from U to A). Inverting puts the steps and their matrices Bz in the 
opposite order and that prevents the mixup. 

This section finds the matrices F;j. Section 2.4 presents four ways to multiply matrices. 
Section 2.5 inverts every step. (For elimination matrices we can already see E;; here.) 
Then those inverses go into L. 
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Matrices times Vectors and Ax = b 


The 3 by 3 example in the previous section has the short form Ax = b: 


2%, +429 —2%3 = 2 2 4 -2 zi 2 
4zı + 9£2 — 3x3 = 8 isthe same as 4 9 —3 to | tet | 8|» (1) 
—22, — 3%2 + 7x3 = 10 —2 -3 i T3 10 


The nine numbers on the left go into the matrix A. That matrix not only sits beside æ. 
A multiplies x. The rule for “A times a” is exactly chosen to yield the three equations. 


Review of A times x. A matrix times a vector gives a vector. The matrix is square when 
the number of equations (three) matches the number of unknowns (three). Our matrix is 
3 by 3. A general square matrix is n by n. Then the vector æ is in n-dimensional space. 


Tı —] 
The unknown is x= | x and the solutionis x= | 2 
T3 2 


Key point: Ax = b represents the row form and also the column form of the equations. 


2 4 —2 2 
Column form Ag = (—1) | 4| +2] 9| +2]-3} =] 8] =b. (2) 
=2 —3 7 10 


Az is a combination of the columns of A. To compute each component of Ax, we use the 
row form of matrix multiplication. Components of Ax are dot products with rows of A. 
The short formula for that dot product with æ uses “sigma notation”. 


The first component of Ax aboveis (—1)(2) + (2)(4) + (2)(—2). 
The ith component of Axis (row i) + £ = aiti + aizto +--+ + Ginn. 
This is sometimes written with the sigma symbol as SS Qijtj. 


Š is an instruction to adt. Start with j = 1 and stop with j = n. The sum 
begins with a;,2; and ends with ain £n. That produces the dot product (row i) - æ. 

One point to repeat about matrix notation: The entry in row 1, column 1 (the top left 
corner) is a11. The entry in row 1, column 3 is a13. The entry in row 3, column 1 is a31. 
(Row number comes before column number.) The word “entry” for a matrix corresponds 
to “component” for a vector. General rule: a;; = A(i, j ) is in row i, column j. 


Example 1 This matrix has a;; = 2i + j. Then aj; = 3. Also aig = 4 and ag; = 5. 
Here is Ax by rows with numbers and letters: 


3 4 2 = 3°2+4-1 Q11 Q12 Tı| _ |@11Tı + Q122 
5 6 Ti 5.2 F6:1 Q21 Q22 Zo} | az1zı + a2222 | ` 
A row times a column gives a dot product. 


'Einstein shortened this even more by omitting the $`. The repeated j in aij£j 


automatically meant addition. He also wrote the sum as a}z;. Not being Einstein, we 


include the `. 
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The Matrix Form of One Elimination Step 


Ax = bis a convenient form for the original equation. What about the elimination steps ? 
In this example, 2 times the first equation is subtracted from the second equation. On 
the right side, 2 times the first component of b is subtracted from the second component. 


2 2 
First step b=] 8 changes to bnew = | 4 
10 10 


We want to do that subtraction with a matrix! The same result bnew = Eb is achieved 
when we multiply an “elimination matrix” E times b. It subtracts 2b; from bo: 


rt © 0 
The elimination matrix is E = |—2 1 0 
OO. i 


Multiplication by E subtracts 2 times row 1 from row 2. Rows 1 and 3 stay the same: 


1 @Q 2 2 1 0 Of | by by 
=—2 1.0 8;=]| 4 —2 1 OO} | be| = | be — 2b, 
0 0 ee | 10 10 eee ene | b3 bs 


The first and third rows of Æ come from the identity matrix J. They don’t change the first 
and third numbers (2 and 10). The new second component is the number 4 that appeared 
after the elimination step. This is b2 — 2b. 


It is easy to describe the “elementary matrices” or “elimination matrices” like this Æ. 
Start with the identity matrix I. Change one of its zeros to the multiplier —£: 


The identity matrix has 1’s on the diagonal and otherwise 0’s. Then Jb = b for all b. 


The elementary matrix or elimination matrix E;; has the extra nonzero entry —£ 
in the 7, 7 position. Then E;; subtracts a multiple £ of row 7 from row i. 


Example 2 The matrix F3; has —£ in the 3, 1 position: 


1 0 0 1 0 0 
Identity Z= |O 1 0 Elimination F3 =| 0 1 0 
0 0 1 -4 0 i 


When you multiply J times b, you get b. But E}, subtracts £ times the first component 
from the third component. With £ = 4 this example gives 9 — 4 = 5: 


1 0O 0j }1 1 i 0 @) J1 1 
Ib= |O 1 0| [3] =]3 and Eb=] 0 L ODO} |S) = 13a 
0 0 1| [9 9 —4 0 1] 19 5 
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What about the left side of Az = b? Both sides will be multiplied by this E31. 
The purpose of E3: is to produce a zero in the (3, 1) position of the matrix. 

The notation fits this purpose. Start with A. Apply E’s to produce zeros below the 
pivots (the first Æ is #21). End with a triangular U. We now look in detail at those steps. 

First a small point. The vector x stays the same. The solution a is not changed by 
elimination. (That may be more than a small point.) It is the coefficient matrix that is 
changed. When we start with Ax = b and multiply by EF, the result is FAx = Eb. 
The new matrix EA is the result of multiplying E times A. 


Confession The elimination matrices E;; are great examples, but you won’t see them 
later. They show how a matrix acts on rows. By taking several elimination steps, we 
will see how to multiply matrices (and the order of the £’s becomes important). Products 
and inverses are especially clear for &’s. It is those two ideas that the book will use. 


Matrix Multiplication 


The big question is: How do we multiply two matrices? When the first matrix is F, 
we know what to expect for L.A. This particular Æ subtracts 2 times row 1 from row 2. 
The multiplier is £ = 2: 


Lo. a 2 4 —2 2 A=? 
EA=|-2 1 0 4 9 —3|=| 0 1 1 (with the zero). (3) 
0 0 1j |-2 -3 T TZR 


This step does not change rows 1 and 3 of A. Those rows are unchanged in E A—only 
row 2 is different. Twice the first row has been subtracted from the second row. Matrix 
multiplication agrees with elimination—and the new system of equations is EFAgæ = Eb. 


E Ag is simple but it involves a subtle idea. Start with Ax = b. Multiplying both 
sides by E gives E(Ax) = Eb. With matrix multiplication, this is also (EA)x = Eb. 


The first was E times Az, the second is EA times æ. They are the same. 


Parentheses are not needed. We just write FA. 

That rule extends to a matrix C' with several column vectors. When multiplying EAC, 
you can do AC first or EA first. This is the point of an “associative law” like 3 x (4 x 5) = 
(3 x 4) x 5. Multiply 3 times 20, or multiply 12 times 5. Both answers are 60. That law 
seems so clear that it is hard to imagine it could be false. 

The “commutative law” 3 x 4 = 4 x 3 looks even more obvious. But FA is usually 
different from AE. When E multiplies on the right, it acts on the columns of A—not 
the rows. AF actually subtracts 2 times column 2 from column 1. So HA # AE. 


Associative law is true A(BC) = (AB)C 


Commutative law is false Often AB 4 BA 
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There is another requirement on matrix multiplication. Suppose B has only one column 
(this column is b). The matrix-matrix law for ÆB should agree with the matrix-vector 
law for /b. Even more, we should be able to multiply matrices EB a column at a time: 


If B has several columns bı, b2, 63, then the columns of EB are Eb,, Eb2, Ebs. 


Matrix multiplication AB = A[|b; bz b3] = [Abı Ab Abs]. (4) 


This holds true for the matrix multiplication in (3). If you multiply column 3 of A by 
E, you correctly get column 3 of EA: 


0 O 2 —2 
—2 1 OO}; J-3} =] 1 E(column j of A) = column j of EA. 
(ee | San | r T 


This requirement deals with columns, while elimination is applied to rows. The next 
section describes each entry of every product AB. The beauty of matrix multiplication 
is that all three approaches (rows, columns, whole matrices) come out right. 


The Matrix P;; for a Row Exchange 


To subtract row 7 from row 7 we use E;j. To exchange or “permute” those rows we use 
another matrix P;; (a permutation matrix). A row exchange is needed when zero is in the 
pivot position. Lower down, that pivot column may contain a nonzero. By exchanging the 
two rows, we have a pivot and elimination goes forward. 

What matrix P23 exchanges row 2 with row 3? We can find it by exchanging rows of 
the identity matrix T: 


1 0 0 
Permutation matrix Pə3= |O 0 1 
0O 1 0 


This is a row exchange matrix. Multiplying by P23 exchanges components 2 and 3 of any 
column vector. Therefore it also exchanges rows 2 and 3 of any matrix: 


Oy 0 1 1 1 0 0||/2 4 1 2AN 
0 0 1|]3|#215 and 0 0 1|10 O 3|=]ļ]|0 6 5 
01-05 3 0 1 0j 10 6 5 0 0 3 


On the right, P53 is doing what it was created for. With zero in the second pivot position 
and “6” below it, the exchange puts 6 into the pivot. 
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Matrices act. They don’t just sit there. We will soon meet other permutation matrices, 
which can change the order of several rows. Rows 1, 2, 3 can be moved to 3, 1, 2. Our P23 
is one particular permutation matrix—it exchanges rows 2 and 3. 


Row Exchange Matrix P,; is the identity matrix with rows ¿ and j reversed. 
When this “permutation matrix” P;; multiplies a matrix, it exchanges rows 7 and 7. 


To exchange equations 1 and 3 multiply by Pi3 = fo 1 3 | : 


Usually row exchanges are not required. The odds are good that elimination uses only 
the £;. But the P;; are ready if needed, to move a pivot up to the diagonal. 


The Augmented Matrix 


This book eventually goes far beyond elimination. Matrices have all kinds of practical 
applications, in which they are multiplied. Our best starting point was a square F times a 
square A, because we met this in elimination—and we know what answer to expect for EA. 
The next step is to allow a rectangular matrix. It still comes from our original equations, 
but now it includes the right side b. 

Key idea: Elimination does the same row operations to A and to b. We can include 
b as an extra column and follow it through elimination. The matrix A is enlarged or 
“augmented” by the extra column b: 


Bd 22 
Augmented matrix [A b] =| 4 9 -3 8 
—2 -3 7 10 


Elimination acts on whole rows of this matrix. The left side and right side are both mul- 
tiplied by E, to subtract 2 times equation 1 from equation 2. With [ A b] those steps 
happen together: 


The new second row contains 0, 1, 1,4. The new second equation is x2 + £3 = 4. Matrix 
multiplication works by rows and at the same time by columns: 


ROWS Fach row of E actson[A b] to givearowof[EA Eb]. 
COLUMNS E acts on each column of [A b] to give a column of [EA Eb]. 
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Notice again that word “acts.” This is essential. Matrices do something! The matrix 
A acts on æ to produce b. The matrix E operates on A to give EA. The whole process of 
elimination is a sequence of row operations, alias matrix multiplications. A goes to F2;A 
which goes to £3) £2, A. Finally E393; Fo) A is a triangular matrix. 

The right side is included in the augmented matrix. The end result is a triangular system 
of equations. We stop for exercises on multiplication by Æ, before writing down the rules 
for all matrix multiplications (including block multiplication). 


= REVIEW OF THE KEY IDEAS = 


1. Ax = x times column 1 + --- + £n times column n. And (Az); = i Gig te 
2. Identity matrix = 1, elimination matrix = F;; using £;;, exchange matrix = P;,;. 


3. Multiplying Ax = b by E»; subtracts a multiple £2; of equation 1 from equation 2. 
The number —£9) is the (2, 1) entry of the elimination matrix Fo). 


4. For the augmented matrix | A b | , that elimination step gives [ E21 A Eb). 


5. When A multiplies any matrix B, it multiplies each column of B separately. 


= WORKED EXAMPLES = 


2.3 A What 3 by 3 matrix £2; subtracts 4 times row 1 from row 2? What matrix P32 
exchanges row 2 and row 3? If you multiply A on the right instead of the left, describe the 
results AFE»; and AP3o. 


Solution By doing those operations on the identity matrix J, we find 


1 0 0 Tong 
Es; = —4 1 0 and P35 = 0 0 1 
0 0] oS 0) 


Multiplying by Eəı on the right side will subtract 4 times column 2 from column 1. 
Multiplying by P32 on the right will exchange columns 2 and 3. 


2.3B Write down the augmented matrix [A b] with an extra column: 


© 2y a 22 = 1 
4z + 8y = 9z= 3 
3y +2z=1 


Apply £2; and then P32 to reach a triangular system. Solve by back substitution. What 
combined matrix P32 E21 will do both steps at once? 
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Solution 2; removes the 4 in column 1. But zero also appears in column 2: 


1 2 2 1 I 2 2 1 
[A b=]4893 and Ex [A bj=/0 0 1 -1 
0 3 2 1 0 3 2 1 
Now P32 exchanges rows 2 and 3. Back substitution produces z then y and z. 
1 2 2 1 T 1 
P32 F(A bl=]}] 0 3 2 1 and y| = 1 
0 0 1 -=i z —1 
For the matrix P32 E> that does both steps at once, apply P32 to £1. 
Cue matniy Ps2 E21 = exchange th f En = ó ; í 
Both steps 32 21 = excnange tne rows o 21 = EE 


2.3 © Multiply these matrices in two ways. First, rows of A times columns of B. 
Second, columns of A times rows of B. That unusual way produces two matrices that 
add to AB. How many separate ordinary multiplications are needed? 


3 4 Si 10 16 
Both ways AB= |1 5 k i =| 7 9 
2 0 4 8 


Solution Rows of A times columns of B are dot products of vectors: 


(row 1) + (column 1) = ieee] A =10 is the (1, 1) entry of AB 


(row 2) - (column 1) = Ey 39 H = 7 isthe (2,1) entry of AB 


We need 6 dot products, 2 multiplications each, 12 in all (3 - 2 - 2). The same AB comes 
from columns of A times rows of B. A column times a row is a matrix. 


[2 4] 


3 [1 1] 
AB= |1 + 
2 


1 


oof 


6 12 4 4 
=|2 4/+1]5 5 
4 8 0 0 
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Problem Set 2.3 


Problems 1-15 are about elimination matrices. 


1 


Write down the 3 by 3 matrices that produce these elimination steps: 


(a) £21 subtracts 5 times row 1 from row 2. 
(b) E32 subtracts —7 times row 2 from row 3. 
(c) P exchanges rows 1 and 2, then rows 2 and 3. 
In Problem 1, applying E21 and then E32 to b = (1,0,0) gives E32 F21b = 


Applying E32 before E21 gives E21 E320 = . When £39 comes first, 
row feels no effect from row 


Which three matrices E21, E31, E32 put A into triangular form U? 


L Lg 
A= 4 6 1 and E3 E31 E21 Å =U. 
=2 2° 0 


Multiply those E’s to get one matrix M that does elimination: MA = U. 


Include b = (1,0, 0) as a fourth column in Problem 3 to produce | A b]. Carry out 
the elimination steps on this augmented matrix to solve Ax = b. 


Suppose a33 = 7 and the third pivot is 5. If you change a33 to 11, the third pivot is 
. If you change a33 to , there is no third pivot. 


If every column of A is a multiple of (1,1,1), then Az is always a multiple of 
(1, 1,1). Do a3 by 3 example. How many pivots are produced by elimination? 


Suppose E subtracts 7 times row 1 from row 3. 


(a) To invert that step you should ___—«7« times row to row 
(b) What “inverse matrix” ET! takes that reverse step (so E~1E = I)? 
(c) If the reverse step is applied first (and then E) show that FET! = J. 


The determinant of M = E A is det M = ad — bc. Subtract £ times row 1 


from row 2 to produce a new M*. Show that det M* = det M for every 2. When 
£ = c/a, the product of pivots equals the determinant: (a)(d — £b) equals ad — bc. 


(a) E21 subtracts row 1 from row 2 and then P23 exchanges rows 2 and 3. What 
matrix M = P23 E21 does both steps at once? 


(b) Po3 exchanges rows 2 and 3 and then £3; subtracts row 1 from row 3. What 
matrix M = £3; P23 does both steps at once? Explain why the M’s are the 
same but the £’s are different. 
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10 


11 


12 


13 


14 


15 


(a) What 3 by 3 matrix £3 will add row 3 to row 1? 
(b) What matrix adds row | to row 3 and at the same time row 3 to row 1? 
(c) What matrix adds row 1 to row 3 and then adds row 3 to row 1? 


Create a matrix that has a11 = a22 = a33 = 1 but elimination produces two negative 
pivots without row exchanges. (The first pivot is 1.) 


Multiply these matrices: 


0 0 1 1 2 3|J]O O 1 1 0 0|J]1 2 3 
0 1 0} |4 5 6/40 1 0 Sl 1 OE als. 3 1l. 
1 0 OJ |7 8 9] |1 00 -1 0 1j|1 4 Oj 


Explain these facts. If the third column of B is all zero, the third column of EB is 
all zero (for any E). If the third row of B is all zero, the third row of EB might not 
be zero. 


This 4 by 4 matrix will need elimination matrices H2; and £32 and E43. What are 
those matrices? 


Death: 105. 210 
a 2:24, 0 
coal Rigs (ae ea 
0 0-1 2] 


Write down the 3 by 3 matrix that has a;; = 27 — 37. This matrix has a32 = 0, but 
elimination still needs £32 to produce a zero in the 3,2 position. Which previous 
step destroys the original zero and what is £32? 


Problems 16-23 are about creating and multiplying matrices. 


16 


17 


18 


Write these ancient problems in a 2 by 2 matrix form Ax = b and solve them: 


(a) X is twice as old as Y and their ages add to 33. 
(b) (x,y) = (2,5) and (3,7) lie on the line y = mz + c. Find m and c. 


The parabola y = a + bx + cx? goes through the points (x,y) = (1,4) and (2,8) 
and (3, 14). Find and solve a matrix equation for the unknowns (a, b, c). 
Multiply these matrices in the orders FF’ and FE: 
1 0 0 1 0 0 
E=ļa 1 0 F=ļ|0 1 0 
b 0O 1 0 ce I 


Also compute E? = EE and F? = FFF. You can guess F10., 


T 
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20 


21 
22 


23 
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Multiply these row exchange matrices in the orders PQ and QP and P?: 


0 1 0 0 0 1 
P=|1 0 0 and .Q— | 0) 
0 0 1 1 0 0 


Find another non-diagonal matrix whose square is M? = T. 
(a) Suppose all columns of B are the same. Then all columns of EB are the same, 
because each one is E times 
(b) Suppose all rows of B are[1 2 4]. Show by example that all rows of E B are 
not|1 2 4]. Itis true that those rows are 


If E adds row 1 to row 2 and F adds row 2 to row 1, does EF equal FE? 


The entries of A and x are a;; and x;. So the first component of Ax is $` a1;2; = 
44121 +++: + dingn. If E21 subtracts row 1 from row 2, write a formula for 

(a) the third component of Ax 

(b) the (2,1) entry of E21 A 

(c) the (2,1) entry of E21 (£2) A) 

(d) the first component of E21 Aa. 


The elimination matrix E = E 4 subtracts 2 times row 1 of A from row 2 of A. 


The result is ÆA. What is the effect of E(EA)? In the opposite order AE, we are 
subtracting 2 times of A from ____. (Do examples.) 


Problems 24-27 include the column b in the augmented matrix [A b]. 


24 


25 


26 


Apply elimination to the 2 by 3 augmented matrix [A 6]. What is the triangular 
system Ux = c? What is the solution x? 


vente al bal 


Apply elimination to the 3 by 4 augmented matrix [A b]. How do you know this 
system has no solution? Change the last number 6 so there is a solution. 


1 2 8 x 1 
3 D- 2 6 


The equations Ax = b and Ax* = b* have the same matrix A. What double 
augmented matrix should you use in elimination to solve both equations at once? 


Solve both of these equations by working on a 2 by 4 matrix: 


e allsl=l = [2 allel- ih 
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27 Choose the numbers a, b, c, d in this augmented matrix so that there is (a) no solution 
(b) infinitely many solutions. 


1 2 3 a 
[A bj=|0 45b 
0 0 d c 
Which of the numbers a, b, c, or d have no effect on the solvability? 


28 If AB = I and BC = I use the associative law to prove A = C. 
Challenge Problems 


29 Find the triangular matrix E that reduces “Pascal’s matrix” to a smaller Pascal: 


1 0 0 0 1 0 0 0 
TERES 1100] Jo 100 
Elimination on column 1 E 1 oa Oho le ae 
EEE Ord De 


Which matrix M (multiplying several E’s) reduces Pascal all the way to I? 
Pascal’s triangular matrix is exceptional, all of its multipliers are 4;; = 1. 


30 Write M = [3 4] asa product of many factors A = |} 9 ] and B = [33]. 


(a) What matrix E subtracts row 1 from row 2 to make row 2 of EM smaller? 
(b) What matrix F subtracts row 2 of EM from row 1 to reduce row 1 of FEM? 
(c) Continue £’s and F”s until (many E’s and F’s) times (M) is (A or B). 


(d) E and F are the inverses of A and B! Moving all E’s and F’s to the right side 
will give you the desired result M = product of A’s and B’s. 


This is possible for integer matrices M = [2 H > 0 that have ad — bc = 1. 


31 Find elimination matrices Es; then £32 then E43 to change K into U: 


1 0 0 0 
—a 1 0 0 

E43 E32 E21 o ool I. 
0 0 -e 1 


Apply those three steps to the identity matrix J, to multiply E.43F32F 1. 
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2.4 Rules for Matrix Operations 


1 Matrices A with n columns multiply matrices B with n rows:| Amxn Bnx p = Cmxp. 


2 Each entry in AB = C is a dot product: Ci; = (row i of A) + (column j of B). 
3 This rule is chosen so that AB times C equals A times BC'. And (AB) x = A(B zx). 
4 More ways to compute AB : (A times columns of B) (rows of A times B) (columns times rows). 


5 Itis not usually true that AB = BA. In most cases A doesn’t commute with B. 


B2 


6 Matrices can be multiplied by blocks: A = [A Ag] times B = 


By | is A,B, + Ao Bo. 


I will start with basic facts. A matrix is a rectangular array of numbers or “entries”. 
When A has m rows and n columns, it is an “m by n” matrix. Matrices can be added if 
their shapes are the same. They can be multiplied by any constant c. Here are examples of 
A + B and 2A, for 3 by 2 matrices: 


i 2 2 2 3 4 TE? 2 4 
3 4| +4 4/=]7 8 and 2|3 4|=]ļ|6 8 
0 0 9 9 ee 0 0 0 0 


Matrices are added exactly as vectors are—one entry at a time. We could even regard a 
column vector as a matrix with only one column (so n = 1). The matrix —A comes from 
multiplication by c = —1 (reversing all the signs). Adding A to —A leaves the zero matrix, 
with all entries zero. All this is only common sense. 

The entry in row i and column j is called a;; or A(i, j). The n entries along the first 
TOW are Q11,012,. - -;@in. The lower left entry in the matrix is amı and the lower right is 
amn. The row number t goes from 1 to m. The column number j goes from 1 to n. 


Matrix addition is easy. The serious question is matrix multiplication. When can we 
multiply A times B, and what is the product AB? This section gives 4 ways to find AB. 
But we cannot multiply when A and B are 3 by 2. They don’t pass the following test: 


To multiply AB: If A has n columns, B must have n rows. 


When A is 3 by 2, the matrix B can be 2 by 1 (a vector) or 2 by 2 (square) or 2 by 20. 
Every column of B is multiplied by A. I will begin matrix multiplication the dot product 
way, and return to this column way: A times columns of B. Both ways follow this rule: 


Fundamental Law of Matrix Multiplication AB times C equals A times BC (1) 


The parentheses can move safely in (AB)C = A(BC). Linear algebra depends on this law. 
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Suppose A is m by n and B is n by p. We can multiply. The product AB is m by p. 


ead n rows poe 


(m x n)(n x p) = (mxp) $ columns | | p columns p columns 


A row times a column is an extreme case. Then 1 by n multiplies n by 1. The result will 
be 1 by 1. That single number is the “dot product”. 

In every case AB is filled with dot products. For the top corner, the (1,1) entry of AB 
is (row 1 of A) + (column 1 of B). This is the first way, and the usual way, to multiply 
matrices. Take the dot product of each row of A with each column of B. 


1. The entry in row i and column j of AB is (row i of A) » (column j of B) . 


Figure 2.8 picks out the second row (i = 2) of a 4 by 5 matrix A. It picks out the third 
column (j = 3) of a5 by 6 matrix B. Their dot product goes into row 2 and column 3 
of AB. The matrix AB has as many rows as A (4 rows), and as many columns as B. 


* * 


b 
Qil Qi2 ‘`’ Gis = * * (AB)i; * * x 
* * 
* ; * 
bs; 
Ais 4 by 5 Bis 5 by 6 AB is (4 x 5)(5 x 6) = 4 by 6 


Figure 2.8: Here i=2 and j =3. Then (AB)23 is (row 2) - (column 3) = sum of aaxbx3. 


Example 1 Square matrices can be multiplied if and only if they have the same size: 


2 -la a]={ of 


The first dot product is 1- 2 + 1-3 = 5. Three more dot products give 6,1, and 0. Each 
dot product requires two multiplications—thus eight in all. 

If A and B are n by n, so is AB. It contains n? dot products, row of A times column of 
B. Each dot product needs n multiplications, so the computation of AB uses n° separate 
multiplications. For n = 100 we multiply a million times. For n = 2 we have n? = 8. 


Mathematicians thought until recently that AB absolutely needed 2° = 8 multiplica- 
tions. Then somebody found a way to do it with 7 (and extra additions). By breaking n by 
n matrices into 2 by 2 blocks, this idea also reduced the count to multiply large matrices. 
Instead of n? multiplications the count has now dropped to n?:3’6, Maybe n? is possible ? 


But the algorithms are so awkward that scientific computing is done the regular n? way. 
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Example 2 Suppose A is a row vector (1 by 3) and B is a column vector (3 by 1). Then 
AB is 1 by 1 (only one entry, the dot product). On the other hand B times A (a column 
times a row) is a full 3 by 3 matrix. This multiplication is allowed! 


Column times row 


0 0 0 0 
_ 1) [1 2 3]= j1 2 8 
(nxi\(1ixn)=(nxn) 5 2 4 6 


A row times a column is an “inner” product—that is another name for dot product. A col- 
umn times a row is an “outer” product. These are extreme cases of matrix multiplication. 


The Second and Third Ways: Rows and Columns 


In the big picture, A multiplies each column of B. The result is a column of AB. In that 
column, we are combining the columns of A. Each column of AB is a combination of 
the columns of A. That is the column picture of matrix multiplication: 


2. Matrix A times every column of B A| bı ---b, |] = | Abı --- Abp]. 


The row picture is reversed. Each row of A multiplies the whole matrix B. The result 
is arow of AB. Every row of AB is a combination of the rows of B: 


I-23 
3. Every row of A times matrix B [ row i of A| 4 5 6) = [ row iof AB | P 
7 8 9 


We see row operations in elimination (FE times A). Soon we see columns in AAT! = T. 
The “row-column picture” has the dot products of rows with columns. Dot products are 
the usual way to multiply matrices by hand: mnp separate steps of multiply/add. 


AB =(m x n)(n x p) =(m x p) mp dot products with n steps each (2) 


The Fourth Way: Columns Multiply Rows 


There is a fourth way to multiply matrices. Not many people realize how important this is. 
I feel like a magician explaining a trick. Magicians won’t do it but mathematicians try. 
The fourth way was in previous editions of this book, but I didn’t emphasize it enough. 


4. Multiply columns 1 to n of A times rows 1 to n of B. Add those matrices. 


Column 1 of A multiplies row 1 of B. Columns 2 and 3 multiply rows 2 and 3. Then add: 


col 1 col2 col 3 rowl--- 


row 2 - - - | =(col 1) (row 1)+(col 2) (row 2)+(col 3) (row 3). 


row3 --- 


2.4. Rules for Matrix Operations 73 


If I multiply 2 by 2 matrices this column-row way, you will see that AB is correct. 


AB-| 2 b E F _ |aE+bG aF+bH 
S ed G-H ~ | cE+d4G cF+dH 
Add columns of A a b 
times rows of B ao Pale: r]+l alte H | (3) 


Column k of A multiplies row k of B. That gives a matrix (not just a number). Then you 
add those matrices for k = 1,2,...,n to produce AB. 

If AB is (m by n) (n by p) then n matrices will be (column) (row). They are all m by p. 
This uses the same mnp steps as in the dot products—but in a new order. 


The Laws for Matrix Operations 


May I put on record six laws that matrices do obey, while emphasizing a rule they don’t 
obey? The matrices can be square or rectangular, and the laws involving A + B are all 
simple and all obeyed. Here are three addition laws: 


A+B=B+A (commutative law) 
c(A+ B)=cA+cB (distributive law) 
A+(B+C)=(A+8B)+C (associative law). 


Three more laws hold for multiplication, but AB = BA is not one of them: 


AB # BA (the commutative “law” is usually broken) 


A(B+C)=AB+AC (distributive law from the left) 
(A+ B)C=AC+ BC (distributive law from the right) 


A(BC) =(AB)C (associative law for ABC ) (parentheses not needed ). 


When A and B are not square, AB is a different size from BA. These matrices can’t be 
equal—even if both multiplications are allowed. For square matrices, almost any example 
shows that AB is different from BA: 


O 0; |0 1 0 0 O 1/0 0 1 0 
as= i J [ =o i eu Ba= |o 5 i Ae ae 
It is true that AJ = JA. All square matrices commute with J and also with cI. Only these 
matrices cl commute with all other matrices. 


The law A (B + C) = AB + AC is proved a column at a time. Start with A(b+c) = 
Ab + Ac for the first column. That is the key to everything—linearity. Say no more. 


The law A(BC) = (AB)C means that you can multiply BC first or else AB first. 
The direct proof is sort of awkward (Problem 37) but this law is extremely useful. 
We highlighted it above; it is the key to the way we multiply matrices. 
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Look at the special case when A = B = C = square matrix. Then (A times A”) is 
equal to (A? times A). The product in either order is A°. The matrix powers A” follow the 
same rules as numbers: 


AP = AAA... A (p factors) (AP)(A?) = APT4 (AP)? = AP. 


Those are the ordinary laws for exponents. A? times A* is A” (seven factors). But the 
fourth power of A? is A‘? (twelve A’s). When p and q are zero or negative these rules still 
hold, provided A has a “—1 power”—which is the inverse matrix A~+. Then A? = J is the 
identity matrix in analogy with 2° = 1. 

For a number, a~! is 1 /a. For a matrix, the inverse is written A-!. (Itis not I /A, 
except in MATLAB.) Every number has an inverse except a = 0. To decide when A has 
an inverse is a central problem in linear algebra. Section 2.5 will start on the answer. This 
section is a Bill of Rights for matrices, to say when A and B can be multiplied and how. 


Block Matrices and Block Multiplication 


We have to say one more thing about matrices. They can be cut into blocks (which are 
smaller matrices). This often happens naturally. Here is a 4 by 6 matrix broken into blocks 
of size 2 by 2—in this example each block is just J: 


VOOr 0NI- 0 
4 by 6 matrix O 1/10 1/0 1 PEL S 
2 by 2 blocks give A= t 7 =|; rg 7| 
2 by 3 block matrix ER | ee 

0 1 Leo 4 


If B is also 4 by 6 and the block sizes match, you can add A + B a block at a time. 

You have seen block matrices before. The right side vector b was placed next to A 
in the “augmented matrix”. Then [A b] has two blocks of different sizes. Multiplying 
by an elimination matrix gave [EA Eb]. No problem to multiply blocks times blocks, 
when their shapes permit. 


Block multiplication If blocks of A can multiply blocks of B, then block multiplication 
of AB is allowed. Cuts between columns of A match cuts between rows of B. 


Ay Az} | Bu} _ Ai Bi, + A12Bai (2) 
Aoi Ag2| | Bar A21 By, + Az2 Ba) | ` asi 


This equation is the same as if the blocks were numbers (which are 1 by 1 blocks). We are 
careful to keep A’s in front of B’s, because BA can be different. 


Main point When matrices split into blocks, it is often simpler to see how they act. The 
block matrix of J’s above is much clearer than the original 4 by 6 matrix A. 
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Example 3 (Important special case) Let the blocks of A be its n columns. Let the 
blocks of B be its n rows. Then block multiplication AB adds up columns times rows: 


Columns | | — 0 = 
times Qa, **- an : = ]a,b,+---+a,b,|. (5) 
rows | | =a i p 


This is Rule 4 to multiply matrices. Here is a numerical example: 


Wie PARME 


Summary The usual way, rows times columns, gives four dot products (8 multiplications). 
The new way, columns times rows, gives two full matrices (the same 8 multiplications). 


Example 4 (Elimination by blocks) Suppose the first column of A contains 1, 3, 4. 
To change 3 and 4 to 0 and 0, multiply the pivot row by 3 and 4 and subtract. Those 
row operations are really multiplications by elimination matrices F2; and E3; : 


1 0 0 i -20 0 
One at a time Fo, = |-3 1 0 and E3 =] 0 1 0 
0 0 1 —4 0 1 


The “block idea” is to do both eliminations with one matrix E&E. That matrix clears out the 
whole first column of A below the pivot a = 1 : 


V E 0 0 


L g z l z 
E= t=3 -1 0 multiplies 3 xr r togie EA= |O y y 
—4 0 1 Ane 0 2-2 


Using inverse matrices, a block matrix & can do elimination on a whole (block) column. 
Suppose a matrix has four blocks A, B, C, D. Watch how E eliminates C by blocks : 


AIB A B 
-Kiet e 
CID 0| D-CA B 


Elimination multiplies the first row [A B] by CA! (previously c/a). It subtracts from 

C to get a zero block in the first column. It subtracts from D to get S = D—CA~'!B. 
This is ordinary elimination, a column at a time—using blocks. The pivot block is A. 

That final block is D — CA~1B, just like d — cb/a. This is called the Schur complement. 


I 
GA 


0 
I 


Block 
elimination 
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= REVIEW OF THE KEY IDEAS =" 


. The (i, 7) entry of AB is (row i of A) - (column j of B). 

. An m by n matrix times an n by p matrix uses mnp separate multiplications. 
. A times BC equals AB times C (surprisingly important). 

. AB is also the sum of these n matrices: (column j of A) times (row j of B). 


. Block multiplication is allowed when the block shapes match correctly. 


A on A OH N me 


. Block elimination produces the Schur complement D — C AHB. 


a WORKED EXAMPLES = 


24A A graph or a network has n nodes. Its adjacency matrix S is n by n. This is a 
0-1 matrix with s;; = 1 when nodes 2 and 7 are connected by an edge. 


1 
Adjacency matrix EESO 
9 3 Square and symmetric j= £0 Lt 
a for undirected graphs A A ale Ei 
Edges go both ways Oa 
“er 
4 


The matrix S° has a useful interpretation. (S*);; counts the walks of length 2 between 
node z and node 7. Between nodes 2 and 3 the graph has two walks: go via 1 or go via 4. 
From node 1 to node 1, there are also two walks: 1-2-1 and 1-3-1. 


$112 2 Go 2 
ok 2i z |S 45 5 
= 4 oe a FERES 

21 1.2 2552 


Can you find 5 walks of length 3 between nodes 1 and 2? 
The real question is why S™ counts all the N-step paths between pairs of nodes. Start 
with S° and look at matrix multiplication by dot products: 


Cor = (row i of S)- (column j of S) = Si1S1j + 84282; + $1383; + 8i484;. (7) 


If there is a 2-step path i — 1 — j, the first multiplication gives s;;s;; = (1)(1) = 1. 
Ifi — 1 — j is not a path, then either 7 —> 1 is missing or 1 — 7 is missing. So the 
multiplication gives $;181; = 0 in that case. 
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(S),; is adding up 1’s for all the 2-step paths i > k — j. So it counts those paths. 
In the same way S~1S will count N-step paths, because those are (N — 1)-step paths 
from 2 to k followed by one step from k to j. Matrix multiplication is exactly suited to 
counting paths on a graph—channels of communication between employees in a company. 


2.4B For these matrices, when does AB = BA? When does BC = CB ? When does 
A times BC equal AB times C? Give the conditions on their entries p, q,r, z: 


_ |p 0 {lil _10 z 
we a] ef e 
If p,q,7, 1, z are 4 by 4 blocks instead of numbers, do the answers change ? 


Solution First of all, A times BC’ always equals AB times C. Parentheses are not 
needed in A(BC) = (AB)C = ABC. But we must keep the matrices in this order : 


T p _[ptaq r 
Usually AB 4 BA ap =|? Ea BA=| ; "|. 
By ch BOC=CB p= 7 oren 
y chance = =] 0 =lo ol: 


B and C happen to commute. Part of the explanation is that the diagonal of B is I, which 
commutes with all 2 by 2 matrices. When p, q,r, z are 4 by 4 blocks and 1 changes to J, 
all these products remain correct. So the answers are the same. 


Problem Set 2.4 


Problems 1-16 are about the laws of matrix multiplication. 


1 A is 3 by 5, Bis 5 by 3, C is 5 by 1, and D is 3 by 1. All entries are 1. Which of 
these matrix operations are allowed, and what are the results ? 


BA AB ABD DC A(B+C). 
2 What rows or columns or matrices do you multiply to find 


(a) the second column of AB ? 

(b) the first row of AB? 

(c) the entry in row 3, column 5 of AB? 
(d) the entry in row 1, column 1 of CDE? 


3 Add AB to AC and compare with A(B+C): 


a=[} 9] aa a=[? 2) om c=[2 3) 


4 In Problem 3, multiply A times BC’. Then multiply AB times C. 
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5 Compute A? and A®. Make a prediction for A and A”: 


1 b 2 2 
a=lo l and A=() r 


6 Show that (A + B)? is different from A? + 2AB + B?, when 


a= fo 3 and Bei, if 


0 0 3 0 
Write down the correct rule for (A + B)(A + B) = A? + + B?. 
7 True or false. Give a specific example when false: 


(a) If columns 1 and 3 of B are the same, so are columns 1 and 3 of AB. 
(b) If rows 1 and 3 of B are the same, so are rows 1 and 3 of AB. 
(c) If rows 1 and 3 of A are the same, so are rows 1 and 3 of ABC. 
(d) (AB)? = A?B?. 
8 How is each row of DA and EA related to the rows of A, when 


ee oe =. 0 1 _ ja bl, 
p= |} J and B=| | and aell A 


How is each column of AD and AE related to the columns of A? 


9 Row 1 of A is added to row 2. This gives EA below. Then column 1 of EA is added 
to column 2 to produce (E A)F: 


1 Ojja b a b 
BA=|; i $ E? a 


a a+b | 


0 1 


and (EA)F = (EA) f le a+c+bt+d 


(a) Do those steps in the opposite order. First add column 1 of A to column 2 
by AF, then add row 1 of AF to row 2 by E( AF). 


(b) Compare with (EA)F. What law is obeyed by matrix multiplication? 


10 Row 1 of A is again added to row 2 to produce ÆA. Then F adds row 2 of EA to 
row 1. The result is F'( EA): 


me) i 1 a b | |2a+e 2b+d 
FEA) =h l Pa a o 


(a) Do those steps in the opposite order: first add row 2 to row 1 by F'A, then add 
row 1 of FA to row 2. 


(b) What law is or is not obeyed by matrix multiplication? 
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11 


12 


13 


14 


15 


16 


17 


This fact still amazes me. If you do a row operation on A and then a column opera- 
tion, the result is the same as if you did the column operation first. (Try it.) Why is 
this true? 


(3 by 3 matrices) Choose the only B so that for every matrix A 
(a) BA=4A 
(b) BA=4B 
(c) BA has rows 1 and 3 of A reversed and row 2 unchanged 
(d) All rows of BA are the same as row 1 of A. 


Suppose AB = BA and AC = C'A for these two particular matrices B and C : 


a b . 1 0 0 1 
ial A commutes with 2al l and =l al 


Prove that a = d and b = c = 0. Then A is a multiple of J. The only matrices that 
commute with B and C and all other 2 by 2 matrices are A = multiple of J. 


Which of the following matrices are guaranteed to equal (A — B)?: A? — B?, 
(B — A)?, A? — 2AB + B?, A(A — B) — B(A — B), A?- AB - BA+B?? 


True or false: 


(a) If A? is defined then A is necessarily square. 

(b) If AB and BA are defined then A and B are square. 

(c) If AB and BA are defined then AB and BA are square. 
(d) If AB = B then A= I. 


If A is m by n, how many separate multiplications are involved when 


(a) A multiplies a vector x with n components? 
(b) A multiplies an n by p matrix B? 


(c) A multiplies itself to produce A? ? Here m = n. 
For A = [3 -4 ] and B = [48 4], compute these answers and nothing more: 
(a) column 2 of AB 
(b) row 2 of AB 
(c) row 2 of AA = A? 
(d) row 2 of AAA = A?®. 
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Problems 18-20 use a;,; for the entry in row 2, column J of A. 
18 Write down the 3 by 3 matrix A whose entries are 

(a) aj; = minimum of 7 and j 

(b) aij = (-1)"¥9 

(c) aij = 4/7. 


19 What words would you use to describe each of these classes of matrices? Give a 3 
by 3 example in each class. Which matrix belongs to all four classes? 


(a) aij =0ifi £j 
(b) aj; =0ifi< j 
(c) aij = aji 
(d) aij = aij. 
20 The entries of A are a;;. Assuming that zeros don’t appear, what is 
(a) the first pivot? 
(b) the multiplier 23; of row 1 to be subtracted from row 3? 
(c) the new entry that replaces a32 after that subtraction? 


(d) the second pivot? 
Problems 21-24 involve powers of A. 


21 Compute A”, A?, A4 and also Av, A?v, A®v, Atv for 


0 2 0 0 x 

10 0 2 o Jy 
A= 0002 and v= : 
00 0 0 t 


22 By trial and error find real nonzero 2 by 2 matrices such that 


A4 =-I BC=0 DE = -ED (notallowing DE = 0). 


23 (a) Find a nonzero matrix A for which A? = 0. 


(b) Find a matrix that has A? 4 0 but A? = 0. 


24 By experiment with n = 2 and n = 3 predict A” for these matrices: 


2 1 1 1 a b 
A= 19 ql and Aa =|; and As =| AC 


SSE 


a attire santas 
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Problems 25-31 use column-row multiplication and block multiplication. 


25 
26 


27 


28 


29 


30 


31 


Multiply A times J using columns of A (3 by 3) times rows of I. 


Multiply AB using columns times rows: 


1 0 1 
AB=|2 4 ule 2| [3 3 0') ae ESTN 
DO | 2 


Show that the product of upper triangular matrices is always upper triangular: 
ae Er 
AB=]|O xz x| 10 
0 0 z|J]O 0 0 0 
Proof using dot products (Row times column) (Row 2 of A). (column 1 of B)= 0. 
Which other dot products give zeros ? 


Proof using full matrices (Column times row) Draw «’s and 0’s in (column 2 of A) 
times (row 2 of B). Also show (column 3 of A) times (row 3 of B). 


‘Draw the cuts in A (2 by 3) and B (8 by 4) and AB to show how each of the four 


multiplication rules is really a block multiplication : 


(1) Matrix A times columns of B. Columns of AB 

(2) Rows of A times the matrix B. Rows of AB 

(3) Rows of A times columns of B. Inner products (numbers in AB) 

(4) Columns of A times rows of B. Outer products (matrices add to AB) 


Which matrices E2; and E31 produce zeros in the (2, 1) and (3, 1) positions of £2; A 
and E31 A? 


2 1 0 
A=!-2 0 1 
8 öö 3 


Find the single matrix & = £3; E2; that produces both zeros at once. Multiply EA. 


Block multiplication says that column 1 is eliminated by 


paa E ‘| | A 7 F be i 


In Problem 29, what numbers go into c and D and what is D — cb/a? 


With i? = —1, the product of (A + iB) and (x + iy) is Ax+iBa+iAy— By. Use 
blocks to separate the real part without 2 from the imaginary part that multiplies 2: 


A —B| |x| |Ax—By} real part 
Po Plage ? imaginary part 
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32 


33 


34 


35 


36 


37 


38 
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(Very important) Suppose you solve Ax = b for three special right sides b: 
1 0 0 
Ag, = | 0 and Ag. = |1 and Ag = |0 
0 0 1 


If the three solutions £1, £2, £3 are the columns of a matrix X, what is A times X? 


1,1,1) and z2 = (0,1,1) and 


If the three solutions in Question 32 are zı = ( 
Challenge problem: What is A? 


x3 = (0,0, 1), solve Ax = b when b = (3,5,8). 
Find all matrices A = [2b ] that satisfy A[+ 4] = [+7] A. 


Suppose a “circle graph” has 4 nodes connected (in both directions) by edges around 
a circle. What is its adjacency matrix S from Worked Example 2.4 A? What is S°? 
Find all the 2-step paths predicted by S?. 


Challenge Problems 


Practical question Suppose A is m by n, B is n by p, and C is p by q. Then the 
multiplication count is mnp for AB + mpg for (AB) C. The same matrix comes 
from A times BC with mngq + npq separate multiplications. Notice npq for BC. 

(a) If Ais 2 by 4, Bis 4 by 7, and C is 7 by 10, do you prefer (AB) C or A(BC)? 

(b) With N-component vectors, would you choose (u?v) wT or ut (vwt)? 

(c) Divide by mnpgq to show that (AB) C is faster when n7t +q7} < m7++p7}. 
To prove that (AB) C = A (BC), use the column vectors b;,...,b, of B. First 
suppose that C has only one column c with entries c1, . . . , Cn: 

AB has columns Abj,..., Ab, and then (AB) c equals c1 Abı +--+ + Cn Abn. 
Bchas one column c1b1 +- - -+ Cnbn and then A (Bc) equals A (c1b1 +: --+cnbn). 
Linearity gives equality of those two sums. This proves (AB)c = A(Bc). The same 
is true for all other ____— of C. Therefore (AB)C = A(BC). Apply to inverses: 
If BA = I and AC = I, prove that the left-inverse B equals the right-inverse C. 


(a) Suppose A has rows aj ,..., a7. Why does AT A equal aja} +---+a,a},? 


(b) If C is a diagonal matrix with c,,..., Cm on its diagonal, find a similar sum of 
columns times rows for ATCA. First do an example with m = n = 2. 


A 
| 
i 


| 


A ASS E 
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2.5 Inverse Matrices 


1 If the square matrix A has an inverse, then both A~1A = I and AAT! = J. 
2 The algorithm to test invertibility is elimination: A must have n (nonzero) pivots. 
3 The algebra test for invertibility is the determinant of A: det A must not be zero. 


4 The equation that tests for invertibility is Ax = 0: x = 0 must be the only solution. 


5 If A and B (same size) are invertible then so is AB: | (AB)~1 = B71A7}, 


6 AA~! = I is n equations for n columns of A~ +. Gauss-Jordan eliminates [A I] to [J A~+]. 


7 The last page of the book gives 14 equivalent conditions for a square A to be invertible. 


Suppose A is a square matrix. We look for an “inverse matrix” A~+ of the same size, 
such that AT! times A equals I. Whatever A does, A~* undoes. Their product is the 
identity matrix—which does nothing to a vector, so AT! Ax = x. But A~! might not exist. 

What a matrix mostly does is to multiply a vector x. Multiplying Az = b by A`! 
gives A~'Agw = A`!b. This is x = A~‘b. The product A7!A is like multiplying by 
a number and then dividing by that number. A number has an inverse if it is not zero— 
matrices are more complicated and more interesting. The matrix AT! is called “A inverse.” 


A-T that “inverts” ‘4: 


ible if tere exists a matrix | 


DEFINITION The matrix A is invert 


Two-sided inverse A 'A=I and AA™! =]. (1) 


Not all matrices have inverses. This is the first question we ask about a square matrix: 
Is A invertible? We don’t mean that we immediately calculate A~!. In most problems 
we never compute it ! Here are six “notes” about A7!. 


Note 1 The inverse exists if and only if elimination produces n pivots (row exchanges 
are allowed). Elimination solves Ax = b without explicitly using the matrix A71. 


Note 2 The matrix A cannot have two different inverses. Suppose BA = I and also 
AC = I. Then B = C, according to this “proof by parentheses” : 


B(AC) = (BA)C gives BI=IC o BHC. (2) 


This shows that a left-inverse B (multiplying from the left) and a right-inverse C (mul- 
tiplying A from the right to give AC = J) must be the same matrix. 


Note3 If A is invertible, the one and only solution to Az = bis x = A`! b: 


Multiply Ax=b by A‘. Then x= A`'Ax= A'b. 
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Note 4 (Important) Suppose there is a nonzero vector x such that Ax = 0. Then A 
cannot have an inverse. No matrix can bring 0 back to x. 


If A is invertible, then Ax = O can only have the zero solution x = A~'0 = 0. 


Note5 A 2 by 2 matrix is invertible if and only if ad — bc is not zero: 


-1 
1 2 
2 by 2 Inverse: i J = a l : (3) 


d 


This number ad — bc is the determinant of A. A matrix is invertible if its determinant is not 
zero (Chapter 5). The test for n pivots is usually decided before the determinant appears. 


Note 6 A diagonal matrix has an inverse provided no diagonal entries are zero: 


dı 1/dı 
f A= E then ATI = P 
dn 1/dn 


Example 1 The 2 by 2 matrix A = [42] is not invertible. It fails the test in Note 5, 
because ad — bc equals 2 — 2 = 0. It fails the test in Note 3, because Ax = O when 
x = (2, —1). It fails to have two pivots as required by Note 1. 

Elimination turns the second row of this matrix A into a zero row. 


The Inverse of a Product AB 


For two nonzero numbers a and b, the sum a + b might or might not be invertible. The 
numbers a = 3 and b = —3 have inverses i and -4, Their sum a + b = 0 has no inverse. 
But the product ab = —9 does have an inverse, which is i times — E. 

For two matrices A and B, the situation is similar. It is hard to say much about the 
invertibility of A + B. But the product AB has an inverse, if and only if the two factors A 
and B are separately invertible (and the same size). The important point is that AT! and 


B-t come in reverse order : 
If A and B are invertible then so is AB. The inverse of a product AB is 
(AB = BA. (4) 
To see why the order is reversed, multiply AB times B71 A1. Inside that is BB~! = J: 


Inverse of AB (AB)(B-'A7*) = AIA = AAT =I. 


We moved parentheses to multiply BB! first. Similarly B~!A7! times AB equals J. 
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B-'A7! illustrates a basic rule of mathematics: Inverses come in reverse order. 
It is also common sense: If you put on socks and then shoes, the first to be taken off 
are the . The same reverse order applies to three or more matrices: 


Reverse order (ABC) = CB AS. (5) 


Example 2 Inverse of an elimination matrix. If E subtracts 5 times row 1 from row 2, 
then ÆT} adds 5 times row 1 to row 2: 


io 10 0 
“ees psj 1 0| and E*=|5 1 0 
oon 001 


Multiply EE! to get the identity matrix J. Also multiply E~'E to get J. We are adding 
and subtracting the same 5 times row 1. If AC = I then automatically CA = I. 


For square matrices, an inverse on one side is automatically an inverse on the other side. 


Example 3 Suppose F subtracts 4 times row 2 from row 3, and F'~! adds it back: 


bo 0 1 0 0 
F=|0 1 O| and F'=/0 1 0 
0 =4 > i 04 1 


Now multiply F by the matrix E in Example 2 to find FE. Also multiply ET! times F~! 
to find (FE)~}. Notice the orders FE and ETIP!!! 


1 0 0 1 0 0 
FE=| —5 1 0 is inverted by E-tF-t=| 5 1 0j, (6) 
20 -4 1 0 4 1 


The result is beautiful and correct. The product F'E contains “20” but its inverse doesn’t. 
E subtracts 5 times row 1 from row 2. Then F subtracts 4 times the new row 2 (changed 
by row 1) from row 3. In this order F E, row 3 feels an effect from row 1. 


In the order E~1F'—!, that effect does not happen. First F71 adds 4 times row 2 to 
row 3. After that, E~! adds 5 times row 1 to row 2. There is no 20, because row 3 doesn’t 
change again. In this order E~1 F—+, row 3 feels no effect from row 1. 

This is why the next section chooses A = LU, to go back from the triangular U to A. 
The multipliers fall into place perfectly in the lower triangular L. 


In elimination order F follows E. In reverse order E~! follows F+. 
E~1F~*? is quick. The multipliers 5, 4 fall into place below the diagonal of 1°s. 
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Calculating A~! by Gauss-Jordan Elimination 


I hinted that AT! might not be explicitly needed. The equation Ax = b is solved by 
x = A~'b. But it is not necessary or efficient to compute A~* and multiply it times b. 
Elimination goes directly to x. And elimination is also the way to calculate A~!, as we 
now show. The Gauss-Jordan idea is to solve AAT! = J, finding each column of A~?. 

A multiplies the first column of A7! (call that xı) to give the first column of J (call 
that e1). This is our equation Ax; = e; = (1,0,0). There will be two more equations. 
Each of the columns x, £2, x3 of A—* is multiplied by A to produce a column of I: 

3 columns of A~! AA = Ala, a w3]=[e1 e2 e3] -1 (7) 
To invert a 3 by 3 matrix A, we have to solve three systems of equations: Av; = e; and 
Azə = ez = (0, 1,0) and Az = e3 = (0,0, 1). Gauss-Jordan finds A~! this way. 


The Gauss-Jordan method computes A~! by solving all n equations together. 
Usually the “augmented matrix” [A b] has one extra column b. Now we have three 
right sides e1, €2,e3 (when A is 3 by 3). They are the columns of J, so the augmented 
matrix is really the block matrix [A _ T]. I take this chance to invert my favorite matrix K, 
with 2’s on the main diagonal and —1’s next to the 2’s: 


tl) 
[K e e e]=| -—1 2-10 1 
oo O O 


Start Gauss-Jordan on K 


0 
0 
1 
| (£ row 1 + row 2) 


2 —l a CS 8 ees 1 
3 1 

slo- ee 4 1I 0 

0 —1 2 OcA T] 

2E OTE TOT 
>| o ł -1 ¢ 1 0 

0 6 4 { 2 2 

<3 F 1 (3 row 2 + row 3) 


We are halfway to K~+. The matrix in the first three columns is U (upper triangular). The 
pivots 2, 3 $ are on its diagonal. Gauss would finish by back substitution. The contribution 
of Jordan is to continue with elimination! He goes all the way to the reduced echelon form 


R = I. Rows are added to rows above them, to produce zeros above the pivots : 


7 b 2 =l 0 eo 
ero above 3 ae 
0 = 0 § 2 & 3 
( third pivot ) i a i 7 (3 row 3 +row 2) 
ae ee 
2 0 0 21 3 2 
Zero above w 2 x ; (3 row 2+ row 1) 
second pivot 2 j : z 4 
0 0o 3 3 F7 d 


The final Gauss-Jordan step is to divide each row by its pivot. The new pivots are all 1. 
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We have reached J in the first half of the matrix, because K is invertible. The three 
columns of K~' are in the second half of | I Kt): 


3 il 1 

008 ¢ = 

(divide by 2) ae “ais 
(divide by 3) o 1 OMe | =([I 2 z2 z3] =[|I K+]. 

(divide by $) Aea 

0 O 1 T 2 a 


Starting from the 3 by 6 matrix [kK I], we ended with [J K+]. Here is the whole 
Gauss-Jordan process on one line for any invertible matrix A: 


Gauss-Jordan Multiply (A I] by A`! toget [I A7. 


The elimination steps create the inverse matrix while changing A to J. For large matrices, 
we probably don’t want AT! at all. But for small matrices, it can be very worthwhile to 
know the inverse. We add three observations about K~!: an important example. 


1. K is symmetric across its main diagonal. Then K~! is also symmetric. 


2. K is tridiagonal (only three nonzero diagonals). But KT! is a dense matrix with 
no zeros. That is another reason we don’t often compute inverse matrices. The 
inverse of a band matrix is generally a dense matrix. 


3. The product of pivots is 2 (2) ($) = 4. This number 4 is the determinant of K. 


1 
K—? involves division by the determinant of K D E Z 


a ek 

a! aay IE: (8) 
T23 

This is why an invertible matrix cannot have a zero determinant: we need to divide. 


Example 4 Find A~! by Gauss-Jordan elimination starting from A = i A 
_|2 3 1 0 2 2 it, 0 T =| 
[A el 7 0 i — lo 1 2 1 (this is [U L J) 
2 0 7 -3 1 0 %4% -3 A zi 
-f 1 i > F tos 4 (this is [7 A7*]). 
Example5 Jf A is invertible and upper triangular, so is AT}. Start with AAT! = I. 


1 A times column j of A~* equals column j of I, ending with n — j zeros. 
2 Back substitution keeps those n — j zeros at the end of column j of A~?. 


3 Put those columns [*...*0...0]7 into A7! and that matrix is upper triangular! 
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Ata E k _ ; Columns 7 = 1 and 2 end 
7 E with 3 — j = 2 and 1 zeros. 
0 0 1 0 oO 1 


The code for X = inv(A) can use rref, the reduced row echelon form from Chapter 3: 


I = eye (n); % Define the n by n identity matrix 
R= rref([A J)); 7 Eliminate on the augmented matrix [A 7] 
X=R(:,n+1:n+n) % Pick X = A`! from the last n columns of R 


A must be invertible, or elimination cannot reduce it to J (in the left half of R). 

Gauss-Jordan shows why AT! is expensive. We solve n equations for its n columns. 
But all those equations involve the same matrix A on the left side (where most of the work 
is done). The total cost for AT} is n? multiplications and subtractions. To solve a single 
Az = b that cost (see the next section) is n°/3. 


To solve Ax = b without A` t+, we deal with one column b to find one column z. 


Singular versus Invertible 


We come back to the central question. Which matrices have inverses? The start of this 
section proposed the pivot test: AT t exists exactly when A has a full set of n pivots. 
(Row exchanges are allowed.) Now we can prove that by Gauss-Jordan elimination: 


1. With n pivots, elimination solves all the equations Ax; = e;. The columns æ; go 
into A~!. Then AA~! = I and A`! is at least a right-inverse. 


2. Elimination is really a sequence of multiplications by E’s and P’s and D7?: 
Left-inverse C CGA=(D"'..-E...P+.. BE)A=T. (9) 


D~" divides by the pivots. The matrices E produce zeros below and above the pivots. 
P will exchange rows if needed (see Section 2.7). The product matrix in equation (9) is 
evidently a left-inverse of A. With n pivots we have reached ATHA = J. 

The right-inverse equals the left-inverse. That was Note 2 at the start of in this section. 
So a square matrix with a full set of pivots will always have a two-sided inverse. 

Reasoning in reverse will now show that A must have n pivots if AC = I. 


1. If A doesn’t have n pivots, elimination will lead to a zero row. 

2. Those elimination steps are taken by an invertible M. So a row of M A is zero. 

3. If AC = I had been possible, then MAC = M. The zero row of M A, times C, 
gives a zero row of M itself. 

4. An invertible matrix M can’t have a zero row! A must have n pivots if AC = I. 


That argument took four steps, but the outcome is short and important. C is A~?. 
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Elimination gives a complete test for invertibility of a square matrix. AT t exists (and 
Gauss-Jordan finds it) exactly when A has n pivots. The argument above shows more: 


If AC=I then CA=I and C = A`! (10) 


Example 6 If Lis lower triangular with 1’s on the diagonal, so is L71. 
A triangular matrix is invertible if and only if no diagonal entries are zero. 


Here L has 1’s so L~! also has 1’s. Use the Gauss-Jordan method to construct L~! from 
E32, £31, E21. Notice how L~' contains the strange entry 11, from 3 times 5 minus 4. 


Gauss-Jordan 
on triangular L 


n 
© 
j= 
© 


=[L 1] 


A 
(S 
{= 
(æ; 
© 
= 


(3 times row 1 from row 2) 
(4 times row 1 from row 3) 
(then 5 times row 2 from row 3) 


oo 
ore © 
a) 
ae 

mw 
omo 
a) 


The inverse 1T 0 0 a 0 0 
is still 0 1 0-3 1 0]ļ]= Ri Li 
triangular > 10 ae ee 


Recognizing an Invertible Matrix 


Normally, it takes work to decide if a matrix is invertible. The usual way is to find a full set 
of nonzero pivots in elimination. (Then the nonzero determinant comes from multiplying 
those pivots.) But for some matrices you can see quickly that they are invertible because 
every number a;; on their main diagonal dominates the off-diagonal part of that row 7. 


Diagonally dominant matrices are invertible. Each a;; on the diagonal is larger than 
the total sum along the rest of row 2. On every row, 


lass] > X` |aig| means that |a| > |an] +--+ (skip lasl) + Jain]. AD 
j#i 


Examples. A is diagonally dominant (3 > 2). B is not (but still invertible). C is singular. 


3 1 1 2 i 1 1 1 1 
A=] 1 3 1 B=|1 2 1 C=]1 1 1 
1 1 3 1 1 3 1 1 3 
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Reasoning. Take any nonzero vector x. Suppose its largest component is |x;|. Then 
Az = 0 is impossible, because row i of Ax = 0 would need 


Ait, +++ + Gye, + +++ + Ginkn = 0. 


Those can’t add to zero when A is diagonally dominant! The size of ajx; (that one 
particular term) is greater than all the other terms combined: 


All |x;| < |z: 3 laijx5| < > la;j| |x] < |a;i||x:| because a;; dominates 
j#i Ta 


This shows that Ax = 0 is only possible when a = 0. So A is invertible. The example B 
was also invertible but not quite diagonally dominant: 2 is not larger than 1 + 1. 


= REVIEW OF THE KEY IDEAS = 


1. The inverse matrix gives AAT! = J and A~1 A = I. 


2. A is invertible if and only if it has n pivots (row exchanges allowed). 
3. Important. If Ax = 0 for a nonzero vector x, then A has no inverse. 
4. The inverse of AB is the reverse product B71 A~}. And(ABC)~' =C7!B71A7!, 
5. The Gauss-Jordan method solves AAT! = I to find the n columns of A7?. 


The augmented matrix [A I ] is row-reduced to [I A |. 


6. Diagonally dominant matrices are invertible. Each |a,;| dominates its row. 


= WORKED EXAMPLES = 


2.5A The inverse of a triangular difference matrix A is a triangular sum matrix S: 


1 0 0/1 0 0 1 0 OF1 0 0 
(A T= =-1 1 O70 4 O)-4)0 1 O}1 1 0 
0 —1 1/0 0 1 0 —1 1/0 0 1 
LO OF Oo 
=O 1 ot 1 0 N At |= |i sum matrix ] . 
O 0 171 1 1 


If I change aı3 to —1, then all rows of A add to zero. The equation Ax = O will now 
have the nonzero solution æ = (1,1,1). A clear signal: This new A can’t be inverted. 
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2.5 B Three of these matrices are invertible, and three are singular. Find the inverse 
when it exists. Give reasons for noninvertibility (zero determinant, too few pivots, nonzero 
solution to Ax = 0) for the other three. The matrices are in the order A, B,C, D, S, E: 


1 
PE Vacs PE Er 
8 6 8 7 6 0 6 6 E CaA 
Solution 
1 OO 
1 7 28 1fo 6 E 
-1 dee sc Seems Bods E 
a Sales J a ele | i Me 


A is not invertible because its determinant is 4-6 — 3 -8 = 24 — 24 = 0. D is not 
invertible because there is only one pivot; the second row becomes zero when the first row 
is subtracted. Æ has two equal rows (and the second column minus the first column is zero). 
In other words Ææ = 0 has the solution x = (—1, 1, 0). 


Of course all three reasons for noninvertibility would apply to each of A, D, E. 
2.5 C Apply the Gauss-Jordan method to invert this triangular “Pascal matrix” L. 


You see Pascal’s triangle—adding each entry to the entry on its left gives the entry below. 
The entries of L are “binomial coefficients”. The next row would be 1, 4, 6, 4, 1. 


1 0 0 0 
‘ ? 1 1 0 0 
Triangular Pascal matrix L = ‘i ae dl he abs(pascal (4,1)) 
1 3 3 1 
Solution Gauss-Jordan starts with | L I] and produces zeros by subtracting row 1: 
1 0 0 0/1 0 0 0 1 0 0 0 1 0 0 0 
IL I]= 1 1 0 0/0 10 0 $ O 1 0 0/—1 1 0 0 
“11 2 1:00 0 1 0 0 2 1 Oe ab 0 1 0 
1 3 3 1/0 0 0 1 0 3 3 1/-1 0 0 1 


The next stage creates zeros below the second pivot, using multipliers 2 and 3. Then the 
last stage subtracts 3 times the new row 3 from the new row 4: 


1 000l i 000 trogo i g -ga 
o Tt Oe. A wpb Alr E walee 
>lo o 1 ol 1 -2 1 Olle 0-1-0) 1 -2 a-o/> 2 1: 
020) a alll aa 0 0 0 1|-1 3 -3 1 


All the pivots were 1! So we didn’t need to divide rows by pivots to get J. The inverse 
matrix LT} looks like L itself, except odd-numbered diagonals have minus signs. 
The same pattern continues to n by n Pascal matrices. L~+ has “alternating diagonals”. 
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Problem Set 2.5 


1 Find the inverses (directly or from the 2 by 2 formula) of A, B,C: 


03 2n 3 A 
a= l and B= (i s| and S A 


2 For these “permutation matrices” find P~! by trial and error (with 1’s and 0’s): 
0 0 1 OSLO 
TEE NOE E and P=) 0 1 
1 0 0 1 0 0 


3 Solve for the first column (z, y) and second column (t, z) of A7?: 
10 20) fz] _ [1] ,.4 [10 20] fe] _ fo 
20 50; /y} {0 20 50/12] ab" 


4 Show that E a | is not invertible by trying to solve AAT! = J for column 1 of A7?: 


1223) iel 1 For a different A, could column 1 of A~ 1 
3 6|ly]| l0 be possible to find but not column 2? 


5 Find an upper triangular U (not diagonal) with U? = I which gives U = U71. 
6 (a) If A is invertible and AB = AC, prove quickly that B = C. 
(b) If A = [ 7 i: find two different matrices such that AB = AC. 


7 (Important) If A has row 1 + row 2 = row 3, show that A is not invertible: 


(a) Explain why Ag = (0,0,1) cannot have a solution. Add eqn 1 + eqn 2. 
(b) Which right sides (b1, b2, b3) might allow a solution to Aw = b? 


(c) In elimination, what happens to equation 3 ? 
8 If A has column 1 + column 2 = column 3, show that A is not invertible: 


(a) Find a nonzero solution x to Ax = 0. The matrix is 3 by 3. 
(b) Elimination keeps column 1 + column 2 = column 3. Explain why there is no 


third pivot. 


9 Suppose A is invertible and you exchange its first two rows to reach B. Is the new 
matrix B invertible? How would you find B~! from A7!? 
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10 


12 


13 


14 


15 
16 


17 


18 


19 


20 
21 


Find the inverses (in any legal way) of 


and B= 


oo fk WwW 
CoO Wb 
NOD CO © 
mom oo © 


2 
0 
0 
0 


oF OO 
Oou © 


(a) Find invertible matrices A and B such that A + B is not invertible. 
(b) Find singular matrices A and B such that A + B is invertible. 


If the product C = AB is invertible (A and B are square), then A itself is invertible. 
Find a formula for A`? that involves C~+ and B. 


If the product M = ABC of three square matrices is invertible, then B is invertible. 
(So are A and C.) Find a formula for B~! that involves M~t and A and C. 


If you add row 1 of A to row 2 to get B, how do you find B~! from A~!? 


Notice the order. Theinverseof B= F l | A | is . 


Prove that a matrix with a column of zeros cannot have an inverse. 


d —b 


Multiply E A times E 3 J; What is the inverse of each matrix if ad Æ bc? 


(a) What 3 by 3 matrix E has the same effect as these three steps? Subtract row 1 
from row 2, subtract row 1 from row 3, then subtract row 2 from row 3. 


(b) What single matrix L has the same effect as these three reverse steps? Add row 
2 to row 3, add row 1 to row 3, then add row 1 to row 2. 


If B is the inverse of A”, show that AB is the inverse of A. 


Find the numbers a and b that give the inverse of 5 * eye (4) — ones (4, 4): 


—1 


4 -1 -1 -l a b b b 
Sy Arey at bab b 
=e ae | ~ 1b bab 
Set. 2 A bb ba 


What are a and b in the inverse of 6 x eye (5) — ones (5,5)? 
Show that A = 4 x eye (4) — ones (4, 4) is not invertible : Multiply A * ones (4, 1). 


There are sixteen 2 by 2 matrices whose entries are 1’s and 0’s. How many of them 
are invertible? 
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Questions 22-28 are about the Gauss-Jordan method for calculating A~?. 


22 Change J into AT! as you reduce A to T (by row operations): 


Hash raa ™ (f4-(3 901 


23 Follow the 3 by 3 text example but with plus signs in A. Eliminate above and below 
the pivots to reduce [A I]to[I A7?];: 


2 10100 
[A I]=]1 2 10 1 0 
0 1 200 1 
24 Use Gauss-Jordan elimination on [U I] to find the upper triangular U~?: 


l a b 1 0 0 
UU`! =I 0 1 c Zi zə gz] =]|0 1 0 
0 0 1 0 0 1 


25 Find A`! and B~! (if they exist) by eliminationon [A I]and[B T]: 


2 1 1 2 —1 -1| 
Alt. 2 1 and B= |—-1 2 -1 
1 1 2 =i re, 


26 What three matrices E>, and Ej. and D~! reduce A = E A to the identity matrix? 
Multiply D7! E12 Ez: to find A71. 


27 Invert these matrices A by the Gauss-Jordan method starting with [A I]: 
1 0 0 1 1 1 
A=]2 1 3 and A=]|1 2 2 
0 0 1 1 2 3 


28 Exchange rows and continue with Gauss-Jordan to find A7?: 


pae 36 4): 


29 True or false (with a counterexample if false and a reason if true): 


(a) A 4 by 4 matrix with a row of zeros is not invertible. 
(b) Every matrix with 1’s down the main diagonal is invertible. 


(c) If A is invertible then A~! and A? are invertible. 


| 
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30 


31 


32 


33 


34 


35 


36 


37 


38 


39 


(Recommended) Prove that A is invertible if a Æ 0 anda +Æ b (find the pivots or 
A-t). Then find three numbers c so that C is not invertible: 


a b bl 2 
A=laa b C= tc 
aaa 8 


NO 0 
a aan 


This matrix has a remarkable inverse. Find AT! by elimination on [A I]. Extend 
to a 5 by 5 “alternating matrix” and guess its inverse; then multiply to confirm. 


1 —1 1 —1 
0 1 —1 1 

Invert A = a T and solve Aw = (1,1,1,1). 
0 0 0 1 


Suppose the matrices P and Q have the same rows as J but in any order. They are 
“permutation matrices”. Show that P — Q is singular by solving (P — Q) x = 0. 


Find and check the inverses (assuming they exist) of these block matrices: 
I 0 A 0 0 I 
C I C D I D|’ 


Could a 4 by 4 matrix A be invertible if every row contains the numbers 0, 1, 2, 3 in 
some order? What if every row of B contains 0, 1,2, —3 in some order? 


In the Worked Example 2.5 C, the triangular Pascal matrix L has L-! = DLD, 
where the diagonal matrix D has alternating entries 1, —1, 1, —1. Then LDLD = J, 
so what is the inverse of LD = pascal (4, 1)? 


The Hilbert matrices have H;; = 1/(¢ + j — 1). Ask MATLAB for the exact 6 by 6 
inverse invhilb (6). Then ask it to compute inv (hilb (6)). How can these be different, 
when the computer never makes mistakes? 


(a) Use inv(P) to invert MATLAB’s 4 by 4 symmetric matrix P = pascal (4). 
(b) Create Pascal’s lower triangular L = abs (pascal (4, 1)) and test P = LLT. 


If A = ones (4) and b = rand (4, 1), how does MATLAB tell you that Ax = b has no 
solution? For the special b = ones (4, 1), which solution to Ax =b is found by A\b? 


Challenge Problems 


(Recommended) A is a 4 by 4 matrix with 1’s on the diagonal and —a, —b, —c on the 
diagonal above. Find A~! for this bidiagonal matrix. 
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41 


42 


43 


44 
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Suppose E, E2, Eg are 4 by 4 identity matrices, except FE; has a, b,c in column 1 
and Es has d,e in column 2 and Eg has f in column 3 (below the 1’s). Multiply 
L = E Ev E3 to show that all these nonzeros are copied into L. 


E E2 Es is in the opposite order from elimination (because £3 is acting first). But 
E Ey E3 = Lis in the correct order to invert elimination and recover A. 


Second difference matrices have beautiful inverses if they start with 7}; = 1 
(instead of Ky; = 2). Here is the 3 by 3 tridiagonal matrix T and its inverse: 


f= 0 3 2 1 
T=|-1 2 —1 Toa |2 2 1 
i 2 1 1 1 


One approach is Gauss-Jordan elimination on [T_T]. I would rather write T as the 
product of first differences L times U. The inverses of L and U in Worked Example 
2.5 A are sum matrices, so here are T = LU and T~! = U~!L7!: 


1 1 —l1 0 1 1 1ļ|]1 
T= |—1 1 1 -l Tt = 1 i}j1 1 
0 —1 1 1 1G ie (el mee a 
difference difference sum sum 


Question. (4 by 4) What are the pivots of T? What is its 4 by 4 inverse? 
The reverse order U L gives what matrix T*? What is the inverse of T*? 


Here are two more difference matrices, both important. But are they invertible? 


2 —-l 0 -l Lis =] 0 0 

ee | ee ae Se 2h ear 36 
Cyclic C = Oca > -] Free ends F = 0 -1 > 1 
—1 0 —1 2 0 0 —l1 1 


Elimination for a block matrix: When you multiply the first block row [A B] by 
C'A~! and subtract from the second row [C D], the “Schur complement” S appears: 


I OJA B| [A B A and D are square 
-CA I|IC D| j0 S S=D—-CA™B. 


Multiply on the right to subtract A~! B times block column 1 from block column 2. 


0 S| 1|0 I 


605 EJE mse [e3 


|=? Find S for E | 


2 3 3 
=|4 1 0 
4 0 1 
The block pivots are A and S. If they are invertible, sois| A B; C D]. 


How does the identity A(J + BA) = (I + AB)A connect the inverses of J + BA 
and I + AB? Those are both invertible or both singular: not obvious. 


2.6. Elimination = Factorization: A = LU J 97 


2.6 Elimination = Factorization: A = LU 


A Each elimination step F;; is inverted by L;;. Off the main diagonal change —£;; to +4;;. 


2 The whole forward elimination process (with no row exchanges) is inverted by L: 
L= (Loi L31 sas Dn1)(L32 Ses En2)({ Las aoe Ln3) TE (Lan=i)- 


3 That product matrix L is still lower triangular. Every multiplier £;; is in row 2, column 7. 
4 The original A is recovered from U by A = LU = (lower triangular) (upper triangular). 


5 Elimination on Az = b reaches Ux = c. Then back-substitution solves Ux = c. 


6 Solving a triangular system takes n? /2 multiply-subtracts. Elimination to find U takes n°/3. 


Students often say that mathematics courses are too theoretical. Well, not this section. 
It is almost purely practical. The goal is to describe Gaussian elimination in the most 
useful way. Many key ideas of linear algebra, when you look at them closely, are really 
factorizations of a matrix. The original matrix A becomes the product of two or three 
special matrices. The first factorization—also the most important in practice—comes now 
from elimination. The factors L and U are triangular matrices. The factorization that 
comes from elimination is A = LU. 

We already know U, the upper triangular matrix with the pivots on its diagonal. The 
elimination steps take A to U. We will show how reversing those steps (taking U back 
to A) is achieved by a lower triangular L. The entries of L are exactly the multipliers 
£;;—which multiplied the pivot row j when it was subtracted from row 2. 

Start with a 2 by 2 example. The matrix A contains 2, 1, 6,8. The number to eliminate 
is 6. Subtract 3 times row 1 from row 2. That step is E»: in the forward direction with 
multiplier 22; = 3. The return step from U to Ais L = i (an addition using +3): 


10) 12. 2 f 
Forward from A to U : Bu A=|_§ l F E E 


Back from U to A: BU = [3 i É a 3 = A. 


The second line is our factorization LU = A. Instead of E we write L. Move now to 
larger matrices with many E’s. Then L will include all their inverses. 

Each step from A to U multiplies by a matrix E;; to produce zero in the (i, j) position. 
To keep this clear, we stay with the most frequent case—when no row exchanges are 
involved. If A is 3 by 3, we multiply by #2; and £3; and £39. The multipliers 2;; produce 
zeros in the (2, 1) and (3, 1) and (3, 2) positions—all below the diagonal. Elimination ends 
with the upper triangular U. 
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Now move those £’s onto the other side, where their inverses multiply U: 
(E32E3;E>:)A=U becomes A = (E!E Ez})U whichis A = LU. 


The inverses go in opposite order, as they must. That product of three inverses is L. 
We have reached A = LU. Now we stop to understand it. 


Explanation and Examples 


First point: Every inverse matrix E~' is lower triangular. Its off-diagonal entry is £;;, 
to undo the subtraction produced by —£;;. The main diagonals of E and E~! contain 1’s. 
Our example above had £21 = 3 and E = E A and L = E~! = E ae 


Second point: Equation (2) shows a lower triangular matrix (the product of the F;;) 
multiplying A. It also shows all the B multiplying U to bring back A. This lower 
triangular product of inverses is L. 

One reason for working with the inverses is that we want to factor A, not U. The 
“inverse form” gives A = LU. Another reason is that we get something extra, almost more 
than we deserve. This is the third point, showing that L is exactly right. 


Third point: Each multiplier 2;; goes directly into its 2, 7 position—unchanged—in the 
product of inverses which is L. Usually matrix multiplication will mix up all the num- 
bers. Here that doesn’t happen. The order is right for the inverse matrices, to keep the @’s 
unchanged. The reason is given below in equation (2). 

Since each E~! has 1’s down its diagonal, the final good point is that L does too. 


This is elimination without row exchanges. The upper triangular U 
A= LU has the pivots on its diagonal. The lower triangular L has all 1’s on 
its diagonal. The multipliers £;; are below the diagonal of L. 


Example 1 Elimination subtracts $ times row 1 from row 2. The last step subtracts z 


times row 2 from row 3. The lower triangular L has £2; = i and £35 = Z, Multiplying LU 
produces A: 


I O Ean 
A= |1 2 lijs|3 1 0|] ? 1) 220. 
0i 2 0 4 ıjlo0o 0 ¢ 


The (3, 1) multiplier is zero because the (3, 1) entry in A is zero. No operation needed. 


Example 2 Change the top left entry from 2 in A to 1 in B. The pivots all become 1. 
The multipliers are all 1. That pattern continues when B is 4 by 4: 


1 10 0 1 1 10 0 
Special 1210) _/1 1 1 1 0 
pattern 0 1 2 1| JO 1 1 1 1 
001 2 0011 1 
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These LU examples are showing something extra, which is very important in practice. 
Assume no row exchanges. When can we predict zeros in L and U? 


When a row of A starts with zeros, so does that row of L. 
When a column of A starts with zeros, so does that column of U. 


If a row starts with zero, we don’t need an elimination step. L has a zero, which saves 
computer time. Similarly, zeros at the start of a column survive into U. But please realize: 
Zeros in the middle of a matrix are likely to be filled in, while elimination sweeps forward. 
We now explain why L has the multipliers £;; in position, with no mix-up. 


The key reason why A equals LU: Ask yourself about the pivot rows that are subtracted 
from lower rows. Are they the original rows of A? No, elimination probably changed them. 
Are they rows of U? Yes, the pivot rows never change again. When computing the third 
row of U, we subtract multiples of earlier rows of U (not rows of A!): 


Row 3 of U = (Row 3 of A) — £431 (Row 1 of U) — £32(Row 2 of U). (1) 


Rewrite this equation to see that the row [231 32 1] is multiplying the matrix U: 
(Row 3 of A) = £3; (Row 1 of U) + £32(Row 2 of U) + 1(Row 3 of U). (2) 


This is exactly row 3 of A = LU. That row of L holds £31, £32, 1. All rows look like this, 
whatever the size of A. With no row exchanges, we have A = LU. 


Better balance from LDU A = LU is “unsymmetric” because U has the pivots on its 
diagonal where L has 1’s. This is easy to change. Divide U by a diagonal matrix D that 
contains the pivots. That leaves a new triangular matrix with 1’s on the diagonal: 


dı 1 u2/dı u3/d 
də 1 u23/d2 


Split U into 
dn 1 
It is convenient (but a little confusing) to keep the same letter U for this new triangular 


matrix. It has 1’s on the diagonal (like L). Instead of the normal LU, the new form has 
D in the middle: Lower triangular L times diagonal D times upper triangular U. 


The triangular factorization can be written A= ha) or A= = LDU. 


Whenever you see LDU, it is understood that U T r s on ihe aa Each row is 
divided by its first nonzero entry—the pivot. Then L and U are treated evenly in LDU : 


1 O12 8 : f I O2 1 4 
E i l J splits further into F ‘|| | F ii (3) 


The pivots 2 and 5 went into D. Dividing the rows by 2 and 5 left the rows [1 4] and 
[O 1] in the new U with diagonal ones. The multiplier 3 is still in L. 
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My own lectures sometimes stop at this point. I go forward to 2.7. The next paragraphs 
show how elimination codes are organized, and how long they take. If MATLAB (or any 
software) is available, you can measure the computing time by just counting the seconds. 


One Square System = Two Triangular Systems 


The matrix L contains our memory of Gaussian elimination. It holds the numbers that 
multiplied the pivot rows, before subtracting them from lower rows. When do we need this 
record and how do we use it in solving Ax = b? 

We need L as soon as there is a right side b. The factors L and U were completely 
decided by the left side (the matrix A). On the right side of Ax = b, we use L~! and 
then U~!. That Solve step deals with two triangular matrices. 


1 Factor (into L and U, by elimination on the left side matrix A). 


2 Solve (forward elimination on b using L, then back substitution for æ using U). 


Earlier, we worked on A and b at the same time. No problem with that—just aug- 
ment to [A b]. But most computer codes keep the two sides separate. The memory of 
elimination is held in L and U, to process b whenever we want to. The User’s Guide to 
LAPACK remarks that “This situation is so common and the savings are so important that 
no provision has been made for solving a single system with just one subroutine.” 


How does Solve work on b? First, apply forward elimination to the right side (the 
multipliers are stored in L, use them now). This changes b to a new right side c. We are 
really solving Lc = b. Then back substitution solves Ux = c as always. The original 
system Ax = bis factored into two triangular systems: 


Forward and backward Solve Lc=b andthensolve Ux=c. (4) 


To see that x is correct, multiply Ux = c by L. Then LU x = Lc is just Ax = b. 

To emphasize: There is nothing new about those steps. This is exactly what we have 
done all along. We were really solving the triangular system Lc = b as elimination went 
forward. Then back substitution produced x. An example shows what we actually did. 


Example 3 Forward elimination (downward) on Ax = bends at Ux = c: 


a ut2v=5 ut2v=5 = 
Az =b Ad 09 = 01 becomes S Un=ce 


The multiplier was 4, which is saved in L. The right side used that 4 to change 21 to 1: 
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1 0 5 5 
Le =b The lower triangular system | á 4 [el = H gave c= H : 


Ux =c The upper triangular system k d fe! = H gives g£ = H 


L and U can go into the n? storage locations that originally held A (now forgettable). 


The Cost of Elimination 


A very practical question is cost—or computing time. We can solve 1000 equations on a 
PC. What if n = 100,000? (Is A dense or sparse?) Large systems come up all the time 
in scientific computing, where a three-dimensional problem can easily lead to a million 
unknowns. We can let the calculation run overnight, but we can’t leave it for 100 years. 

The first stage of elimination produces zeros below the first pivot in column 1. To 
find each new entry below the pivot row requires one multiplication and one subtraction. 
We will count this first stage as n? multiplications and n? subtractions. It is actually less, 
n? — n, because row 1 does not change. 

The next stage clears out the second column below the second pivot. The working 
matrix is now of size n— 1. Estimate this stage by (n—1)? multiplications and subtractions. 
The matrices are getting smaller as elimination goes forward. The rough count to reach U 
is the sum of squares n? + (n — 1)? +--+ +274 1?. 

There is an exact formula in (n + 5) (n + 1) for this sum of squares. When n is large, 


the 5 and the 1 are not important. The number that matters is 3 n°. The sum of squares is 
like the integral of x? ! The integral from 0 to n is 3 n: 


Elimination on A requires about 5 n? multiplications and 5 n? subtractions. 


What about the right side b? Going forward, we subtract multiples of bı from the lower 
components b2,...,bn. This is n — 1 steps. The second stage takes only n — 2 steps, 
because b; is not involved. The last stage of forward elimination takes one step. 

Now start back substitution. Computing £n uses one step (divide by the last pivot). The 
next unknown uses two steps. When we reach z; it will require n steps (n — 1 substitutions 
of the other unknowns, then division by the first pivot). The total count on the right side, 
from b to c to x—forward to the bottom and back to the top—is exactly n?: 


[(n-—1)+(n—-2)+---+1) + [1 4+24---+(n-1)4+nJ =7n?. (5) 
To see that sum, pair off (n — 1) with 1 and (n — 2) with 2. The pairings leave n terms, 


each equal to n. That makes n?. The right side costs a lot less than the left side! 


Solve Each right side needs n? multiplications and n? subtractions. 


A band matrix B has only w nonzero diagonals below and above its main diagonal. 
The zero entries outside the band stay zero in elimination (they are zero in L and U). 
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Clearing out the first column needs w? multiplications and subtractions (w zeros to be 
produced below the pivot, each one using a pivot row of length w). Then clearing out all n 
columns, to reach U, needs no more than nw”. This saves a lot of time: 


2 


Band matrix AtoU in? reduces to nw Solve n? reduces to 2 nw 


3 


A tridiagonal matrix (bandwidth w = 1) allows very fast computation. Don’t store zeros ! 

The book’s website has Teaching Codes to factor A into LU and to solve Ax = b. 
Professional codes will look down each column for the largest available pivot, to exchange 
rows and reduce roundoff error. 


MATLAB’s backslash command x = A\b combines Factor and Solve to reach æ. 


How long does it take to solve Ax = b? For a random matrix of order n = 1000, 
a typical time on a PC is 1 second. The time is multiplied by about 8 when n is multiplied 
by 2. For professional codes go to netlib.org. 

According to this n? rule, matrices that are 10 times as large (order 10,000) will take a 
thousand seconds. Matrices of order 100,000 will take a million seconds. This is too ex- 
pensive without a supercomputer, but remember that these matrices are full. Most matrices 
in practice are sparse (many zero entries). In that case A = LU is much faster. 


= REVIEW OF THE KEY IDEAS = 


1. Gaussian elimination (with no row exchanges) factors A into L times U. 


2. The lower triangular L contains the numbers £;; that multiply pivot rows, going from 
A to U. The product LU adds those rows back to recover A. 


. On the right side we solve Le = b (forward) and Ux = c (backward). 


. Factor : There are į (n 


2 


3 — n) multiplications and subtractions on the left side. 


. Solve : There are n“ multiplications and subtractions on the right side. 


nN aA A 0 


. For a band matrix, change ; n? to nw? and change n? to 2wn. 


™ WORKED EXAMPLES = 


2.6 A The lower triangular Pascal matrix L contains the famous “Pascal triangle”. 
Gauss-Jordan inverted L in the worked example 2.5 C. Here we factor Pascal. 
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The symmetric Pascal matrix P is a product of triangular Pascal matrices L and 
U. The symmetric P has Pascal’s triangle tilted, so each entry is the sum of the entry above 
and the entry to the left. The n by n symmetric P is pascal (n) in MATLAB. 


Problem: Establish the amazing lower-upper factorization P = LU. 


TE ee a | 1 000 O E a 
123 4 1100 012 3 
pascal (4 =| 1 3 6 10 1210 00131724 
1 4 10 20 1 3 3 1 OO oa 


Then predict and check the next row and column for 5 by 5 Pascal matrices. 


Solution You could multiply LU to get P. Better to start with the symmetric P and 
reach the upper triangular U by elimination : 


1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

ma 1 2 3 4 Ly 0 1 2 3 E 0 1 2 3 z 0 1 2 3 ai 
1 3 6 10 02 5 9 00 1 3 0 0 1 3 
1 4 10 20 0 3 9 19 0 0 3 10 0 0 0 1 


The multipliers £;; that entered these steps go perfectly into L. Then P = LU is a particu- 
larly neat example. Notice that every pivot is 1 on the diagonal of U. 

The next section will show how symmetry produces a special relationship between the 
triangular L and U. For Pascal, U is the “transpose” of L. 

You might expect the MATLAB command lu (pascal (4)) to produce these L and U. 
That doesn’t happen because the lu subroutine chooses the largest available pivot in each 
column. The second pivot will change from 1 to 3. But a “Cholesky factorization” does no 
row exchanges: U = chol (pascal (4)) 

The full proof of P = LU for all Pascal sizes is quite fascinating. The paper “Pascal 
Matrices” is on the course web page web.mit.edu/18.06 which is also available through 
MIT’s OpenCourseWare at ocw.mit.edu. These Pascal matrices have so many remarkable 
properties—we will see them again. 


2.6 B The problem is: Solve Px = b = (1,0,0,0). This right side = column of I 
means that x will be the first column of P~!. That is Gauss-Jordan, matching the columns 
of PP~! = I. We already know the Pascal matrices L and U as factors of P: 


Two triangular systems Lc = b (forward) Ux = c (back). 


Solution The lower triangular system Lc = b is solved top to bottom: 


Cl = Cı = +1 
Cı + C2 = 0 : C2 = —1 
Cı +2co+ C3 =0 eee c3 = +1 


Cı + 8¢e2 + 3c3 + c4 = O c4 = —1 


104 


Chapter 2. Solving Linear Equations 


Forward elimination is multiplication by L+. It produces the upper triangular system 
Uz = c. The solution x comes as always by back substitution, bottom to top: 


fifo te ge = 1 zı = +4 
t2+ 2%3 + 3%4 = —1 R rq = —6 

£3 + 3f4 = ches z3 = +4 

t4 = —] t4 = —1 


I see a pattern in that x, but I don’t know where it comes from. Try inv (pascal (4)). 


Problem Set 2.6 


Problems 1-14 compute the factorization A = LU (and also A = LDU). 


1 


(Important) Forward elimination changes [+4] = b to a triangular [4 } ]a = c: 
T o eer a iS A | ae 
That step subtracted 02; = times row 1 from row 2. The reverse step adds £21 
times row 1 to row 2. The matrix for that reverse step is L = . Multiply this 
L times the triangular system [§ }]a1 = [5] to get = . In letters, L 


multiplies Ux = c to give 


Write down the 2 by 2 triangular systems Le = b and Ux = c from Problem 1. 
Check that c = (5, 2) solves the first one. Find æ that solves the second one. 


(Move to 3 by 3) Forward elimination changes Ax = b to a triangular Ux = c: 


tH y+ 29 tH gyp z=9 tH yr 2=0 
Ga 204 32 — 7 Yo ce 2 y+2z=2 
xz +3y+6z= 11 2y +5z=6 z =2 


The equation z = 2 in Ux = c comes from the original x + 3y + 6z = 11 in 
Az = bby subtracting 3; = ___ times equation 1 and 32 = ___ times the 
final equation 2. Reverse that to recover [1 3 6 11] in the last row of A and b 
from the final[1 1 1 5]and[0 1 2 2]and[0 0 1 2]inUande: 


Row 3 of | A b| = (£3; Row 1 + £32 Row 2 + 1 Row 3) of [U c]. 
In matrix notation this is multiplication by L. So A= LU and b = Le. 


What are the 3 by 3 triangular systems Le = b and Ux = c from Problem 3? 
Check that c = (5, 2, 2) solves the first one. Which æ solves the second one? 
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10 


What matrix FE puts A into triangular form EA = U? Multiply by E~' = L to 
factor A into LU: 


2 
A= |0 
6 


wee 


0 
2 
5 


What two elimination matrices 2; and E32 put A into upper triangular form 
E32 E21 A = U? Multiply by Ez, and Ezy to factor A into LU = E3} E3,'U: 


1 1 1 
A= |2- 4 5 
0 4 0 
What three elimination matrices £2), E31, E32 put A into its upper triangular form 


E32 E31 E21 A = U? Multiply by Ee. Bay and Eo to factor A into L times U: 


1 1 


0 
A=|2 22) L=2 Bba Ez. 
3 45 
This is the problem that shows how the inverses E,, . 1 multiply to give L. You see 
this best when A is already lower triangular with 1’s on the diagonal. Then U = I! 


The elimination matrices E21, E31, E32 contain —a then —b then —c. 


(a) Multiply £323) E21 to find the single matrix E that produces FA = I. 
(b) Multiply Ez Ezy Ezp to bring back L. 
The multipliers a, b, c are mixed up in E but perfect in L. 


When zero appears in a pivot position, A = LU is not possible! (We are requiring 
nonzero pivots in U.) Show directly why these equations are both impossible: 


0 1 taa 1 1 0 1 d e g 
> 3|=lz 11 lo f 1 1 2|=]ļ|£ 1 f h 
1 2 1 m n til i 


These matrices need a row exchange. That uses a “permutation matrix” P. 


Which number c leads to zero in the second pivot position? A row exchange is 
needed and A = LU will not be possible. Which c produces zero in the third pivot 
position? Then a row exchange can’t help and elimination fails : 


c 0 
A= 4 1 
5B 1 
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11 What are L and D (the diagonal pivot matrix) for this matrix A? What is U in 


A = LU and what is the new U in A = LDU? 


2 4 8 
Already triangular A= i). 3 9 
0 0 7 


12 Aand B are symmetric across the diagonal (because 4 = 4). Find their triple factor- 


izations LDU and say how U is related to L for these symmetric matrices: 


> 4 1 4 0 
Symmetric A= | 4 a and B=j|4 12 4 
0 4 0 


13 (Recommended) Compute L and U for the symmetric matrix A: 


or orer 8 


gee ea 


Find four conditions on a, b, c, d to get A = LU with four pivots. 


14 This nonsymmetric matrix will have the same L as in Problem 13: 


Find L and U for A= 


G. w3 
Ao St Go 3 


Find the four conditions on a, b, c, d, r, s,t to get A = LU with four pivots. 


Problems 15-16 use L and U (without needing A) to solve Ax = b. 


15 Solve the triangular system Le = b to find c. Then solve Ux = c to find z: 


ET a 4] wot 5= (3) 


For safety multiply LU and solve Ax = b as usual. Circle c when you see it. 


16 Solve Lc = b to find c. Then solve Uz = cto find x. What was A? 


1 0 0 1 1 1 4 
b= |b. 1 0 and U=j0 1 1 and b= |5 
Le «Bh i 0 0 1 6 
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17 


18 


19 


20 


21 


22 


23 


(a) When you apply the usual elimination steps to L, what matrix do you reach? 


1 0 0 
L= |, 1 0 
f3, €32 1 


(b) When you apply the same steps to J, what matrix do you get? 

(c) When you apply the same steps to LU, what matrix do you get? 
If A = LDU and also A = Lı DU with all factors invertible, then L = Lı and 
D = D; and U = U,. “The three factors are unique.” 
Derive the equation LG LD = D,U,U™!. Are the two sides triangular or diagonal? 
Deduce L = Lı and U = U; (they all have diagonal 1’s). Then D = D1. 


Tridiagonal matrices have zero entries except on the main diagonal and the two ad- 
jacent diagonals. Factor these into. A = LU and A = LDLT: 


1 1 0 a a 0 
A=]|1 2 1 and A= |a a+b b 
0 1 2 0 b b+c 


When T is tridiagonal, its L and U factors have only two nonzero diagonals. How 
would you take advantage of knowing the zeros in T', in a code for Gaussian elimi- 
nation? Find L and U. 


Tridiagonal T= 


OO TO Ff 
O. WwW bP 


If A and B have nonzeros in the positions marked by x, which zeros (marked by 0) 
stay zero in their factors L and U? 


A= 


DORR 
OR8RRR 
S 8 8 8 
8 8 OR 
8 ORR 
8 ROR 
8 8 RO 


Suppose you eliminate upwards (almost unheard of). Use the last row to produce 
zeros in the last column (the pivot is 1). Then use the second row to produce zero 
above the second pivot. Find the factors in the unusual order A = UL. 


5 3 1 
Upper times lower A= J3 3 1 
1 1 1 


Easy but important. If A has pivots 5, 9, 3 with no row exchanges, what are the pivots 
for the upper left 2 by 2 submatrix Ag (without row 3 and column 3)? 
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24 


25 


26 
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Challenge Problems 


Which invertible matrices allow A = LU (elimination without row exchanges)? 
Good question! Look at each of the square upper left submatrices Ax of A. 


All upper left k by k submatrices A;, must be invertible (sizes k = 1,...,n). 
Explain that answer: A; factors into because LU = | a | 7 : | : 


For the 6 by 6 second difference constant-diagonal matrix K, put the pivots and 
multipliers into K = LU. (L and U will have only two nonzero diagonals, because 
K has three.) Find a formula for the i, j entry of L~+, by software like MATLAB 
using inv (L) or by looking for a nice pattern. 


2-1 
—1 
—1,2,—1 matrix K = iia = toeplitz ((2 —1 0 0 0 0]) 
«œ —l 
-1 2 
If you print K-t, it doesn’t look so good (6 by 6). But if you print 7K71}, 


that matrix looks wonderful. Write down 7K! by hand, following this pattern: 


1 Row 1 and column 1 are (6, 5, 4,3, 2,1). 
2 On and above the main diagonal, row 7 is 7 times row 1. 


3 On and below the main diagonal, column 7 is 7 times column 1. 


Multiply K times that 7K ~! to produce 7I. Here is 4K ~+ for n = 3: 


3 by 3 case 2 -1 0; |3 2 1 4 
The determinant (K)(4K~')=|-1 2 —-1] |2 4 2] = 4 
of this K is 4 0 -1 2| |1 2 3 4 
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2.7 Transposes and Permutations 


1 The transposes of Ax and AB and A~! are a? AT and B™ AT and (A™)—!. 


2 The dot product (inner product) is x+y = Ty. This is (1 x n)(n x 1) = (1x 1). 


The outer product is ey? = column times row = (n x 1)(1 x n) =n x n matrix. 
3 The idea behind AT is that Ax - y equals x - Aly because (Ax) y = aT ATy = aT (ATy). 
4 A symmetric matrix has ST = S (and the product AT A is always symmetric). 


5 An orthogonal matrix has QT = Q~1. The columns of Q are orthogonal unit vectors. 


6 A permutation matrix P has the same rows as J (in any order). There are n ! different orders. 


7 Then Pz puts the components x1, £2, ..., Zn in that new order. And PT equals P7. 


We need one more matrix, and fortunately it is much simpler than the inverse. It is the 
“transpose” of A, which is denoted by AT. The columns of AT are the rows of A. 


When A is an m by n matrix, the transpose is n by m: 


1 
Transpose If A= eae then A?T=| 2 
0 0 4 


4 


You can write the rows of A into the columns of AT. Or you can write the columns of A 
into the rows of AT. The matrix “flips over” its main diagonal. The entry in row i, column j 
of AT comes from row j, column 2 of the original A: 


Exchange rows and columns (A*)s; = Aji 


The transpose of a lower triangular matrix is upper triangular. (But the inverse is still lower 
triangular.) The transpose of AT is A. 


Note MATLAB’s symbol for the transpose of A is A’. Typing [1 2 3] gives a row vec- 
tor and the column vector is v =[1 2 a To enter a matrix M with second column 
w =[456]’ you could define M =| v w ]. Quicker to enter by rows and then 


transpose the whole matrix: M =[1 2 3; 4 5 6)’. 


The rules for transposes are very direct. We can transpose A + B to get (A + B)". 
Or we can transpose A and B separately, and then add AT + B'—with the same result. 
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The serious questions are about the transpose of a product AB and an inverse A7?: 


Sum The transposeof A+B is (1) 
Product The transpose of (2) 
Inverse The transposeof At is (At)? = (3) 


Notice especially how BT AT comes in reverse order. For inverses, this reverse order 
was quick to check: B~'A™! times AB produces J. To understand (AB)? = BT AT, 
start with (Ax)? = x1 AT when B is just a vector: 

Ax combines the columns of A while £x" AT combines the rows of A’. 
It is the same combination of the same vectors! In A they are columns, in AT they are 


rows. So the transpose of the column Az is the row x? AT. That fits our formula (Ax)? = 
a? AT. Now we can prove the formula (4B) = BT AT, when B has several columns. 


If B = [x1 22] has two columns, apply the same idea to each column. The columns 
of AB are Axı and Ag. Their transposes appear correctly in the rows of BT AT: 


at At 


Transposing AB = | Aa, Aaa ---| gives | #2A* | whichis BTAT. (4) 


The right answer BT AT comes out a row at a time. Here are numbers in(AB)? = BTA’: 


{1 OO} }5 0| {5 O TT |5 4/]/1 1] _ |5 9 

AB =|; b J= ‘| ane = E l= [6 Ae 
The reverse order rule extends to three or more factors: (ABC)? equals CTBT A’. 
If A= LDU then AT = UTDTLT. The pivot matrix has D = DT. 


Now apply this product rule by transposing both sides of ATA = J. On one side, 
I" is I. We confirm the rule that (A~!)? is the inverse of AT. Their product is T: 


Transpose of inverse A-'A=I istransposedto AT(AT}HT =I. (5) 


Similarly AAT! = I leads to (A~!)? AT = I. We can invert the transpose or we can 
transpose the inverse. Notice especially: AT is invertible exactly when A is invertible. 


Example 1 The inverse of A = [39] is A~! = [_§ 9]. The transpose is AT = [4 $]. 


(A~1)™ and (AT)! are both equal to een 
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The Meaning of Inner Products 


We know the dot product (inner product) of x and y. It is the sum of numbers ziyi. 
Now we have a better way to write x - y, without using that unprofessional dot. Use 


matrix notation instead: 
T isinside The dot product or inner product is x‘ y (a i 


T is outside The rank one product or outer product is cy? (nx 1)(1 x n) 


xTy is a number, xy! is a matrix. Quantum mechanics would write those as < AR > 
(inner) and |æ >< y| (outer). Maybe the universe is governed by linear algebra. Here are 
three more examples where the inner product has meaning: 

From mechanics Work = (Movements) (Forces) = x1 f 

From circuits Heat loss = (Voltage drops) (Currents) = eT y 


From economics Income = (Quantities) (Prices) = qT p 


We are really close to the heart of applied mathematics, and there is one more point to 
emphasize. It is the deeper connection between inner products and the transpose of A. 


We defined AT by flipping the matrix across its main diagonal. That’s not mathematics. 
There is a better way to approach the transpose. AT is the matrix that makes these two 
inner products equal for every x and y: 


(Ax)Ty = xT(ATy) Inner product of Ax with y = Inner product of x with AT y 


Ly 
: _ —l 1 0 _ = Yı 
Start with A = | 0 1 i x= |x = | | 


On one side we have Ax multiplying y: (£2 — z1) y1 + (£3 — T2) yo 
That is the same as 21 (—y1) + z2 (y1 — y2) + z3 (y2). Now æ is multiplying ATy. 


—yi oa 0 
ATy must be | yı — y2| which produces AT = 1 —1] as expected. 
Y2 0 


Symmetric Matrices 


For a symmetric matrix, transposing A to AT produces no change. Then AT equals A. 
Its (j, i) entry across the main diagonal equals its (i, 7) entry. In my opinion, these are the 
most important matrices of all. We give symmetric matrices the special letter S. 


DEFINITION A symmetric matrix has ST = S . This means that Si; = 977 8 
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à : Il Sse _ td Ol _ we 
Symmetric matrices s=|) m and p= pa 


The inverse of a symmetric matrix is also symmetric. The transpose of S~! is 


(3-H) T = ($T)-! = 9-1, That says S~! is symmetric (when S is invertible): 


P ai |012 a7. it 0 
Symmetric inverses 5 = E 1 and D` = 0 oil 


Now we produce a symmetric matrix S by multiplying any matrix A by AT. 


Symmetric Products ATA and AAT and LDL” 


Choose any matrix A, probably rectangular. Multiply AT times A. Then the product 
S = ATA is automatically a square symmetric matrix: 


The transpose of ATA is AT(AT)T whichis ATA again. (6) 


That is a quick proof of symmetry for ATA. We could look at the (i, j) entry of ATA. 
It is the dot product of row i of AT (column i of A) with column j of A. The (j,i) entry 
is the same dot product, column j with column i. So AT A is symmetric. 


The matrix AAT is also symmetric. (The shapes of A and AT allow multiplication.) 
But AA? is a different matrix from ATA. In our experience, most scientific problems that 
start with a rectangular matrix A end up with AT A or AA? or both. As in least squares. 


= 0 


Example 2 Multiply A = E E i and AT = 1 —1 | in both orders. 
0 1 
J aj 1 -1 0 
AAT = E | and ATA = | -1 2 —l1 | are both symmetric matrices. 
0 -1 1 


The product ATA is n by n. In the opposite order, AAT is m by m. Both are symmetric, 
with positive diagonal (why?). But even if m = n, it is very likely that ATA 4 AAT. 
Equality can happen, but it is abnormal. 


Symmetric matrices in elimination ST = S makes elimination faster, because we can 
work with half the matrix (plus the diagonal). It is true that the upper triangular U is 
probably not symmetric. The symmetry is in the triple product S = LDU. Remember 
how the diagonal matrix D of pivots can be divided out, to leave 1’s on the diagonal of 
both L and U: 


2 7 


23| =le al 


E 7 =| | F J LU misses the symmetry of S 


F J fo A LDL?’ captures the symmetry 


0 1] Now Uis the transpose of L. 
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When S is symmetric, the usual form A = LDU becomes S = LDL”. The final U 
(with 1’s on the diagonal) is the transpose of L (also with 1’s on the diagonal). The 
diagonal matrix D containing the pivots is symmetric by itself. 


If S = ST is factored into LDU with no row exchanges, then U is exactly LT. 


The symmetric factorization of a symmetric matrix is S = LDLT. 


Notice that the transpose of LDL" is automatically (LT)TDTZLT which is LDLT 
again. The work of elimination is cut in half, from n3/3 multiplications to n3/6. The 
storage is also cut essentially in half. We only keep L and D, not U which is just LT. 


Permutation Matrices 


The transpose plays a special role for a permutation matrix. This matrix P has a single “1” 
in every row and every column. Then PT is also a permutation matrix—maybe the same 
as P or maybe different. Any product P; Pz is again a permutation matrix. 

We now create every P from the identity matrix, by reordering the rows of J. 


The simplest permutation matrix is P = I (no exchanges). The next simplest are the 
row exchanges P;;. Those are constructed by exchanging two rows 7 and j of J. Other 
permutations reorder more rows. By doing all possible row exchanges to J, we get all 
possible permutation matrices: 


DEFINITION A permutation matrix P has the rows of the identity I in any order. 


Example 3 There are six 3 by 3 permutation matrices. Here they are without the zeros: 


1 1 1 
I= 1 Pai =a P39 P21 = 


| 
pi 


P31 = 1 P32 = 1 Pa P32 = |1 
1 1 1 


There are n! permutation matrices of order n. The symbol n! means “n factorial,” the 
product of the numbers (1)(2)--- (n). Thus 3! = (1)(2)(3) which is 6. There will be 24 
permutation matrices of order n = 4. And 120 permutations of order 5. 

There are only two permutation matrices of order 2, namely |4 9] and [9 4]. 


Important: P7! is also a permutation matrix. Among the six 3 by 3 P’s displayed 
above, the four matrices on the left are their own inverses. The two matrices on the right 
are inverses of each other. In all cases, a single row exchange is its own inverse. If we 
repeat the exchange we are back to J. But for P32 P21, the inverses go in opposite order 
as always. The inverse is P21 P32. 
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More important: P—+ is always the same as PT. The two matrices on the right are 
transposes—and inverses—of each other. When we multiply P PT, the “1” in the first row 
of P hits the “1” in the first column of PT (since the first row of P is the first column of 
PT). It misses the ones in all the other columns. So PPT = I. 


Another proof of PT = P~! looks at P as a product of row exchanges. Every row 
exchange is its own transpose and its own inverse. PT and P~! both come from the 
product of row exchanges in reverse order. So PT and P~? are the same. 


Permutations (row exchanges before elimination) lead to PA = LU. 


The PA = LU Factorization with Row Exchanges 
We sure hope you remember A = LU. It started with A = (E vee Ei ---)U. Every 


7 
elimination step was carried out by an Fij and it was inverted by i. Those inverses 
were compressed into one matrix L. The lower triangular L has 1’s on the diagonal, and 
the result is A= LU. 

This is a great factorization, but it doesn’t always work. Sometimes row exchanges 
are needed to produce pivots. Then A = (E71... P71... E71... P71...)U. Every 
row exchange is carried out by a P;; and inverted by that P;j. We now compress those 
row exchanges into a single permutation matrix P. This gives a factorization for every 
invertible matrix A—which we naturally want. 

The main question is where to collect the P;;’s. There are two good possibilities— 
do all the exchanges before elimination, or do them after the £;,;’s. The first way gives 
PA = LU. The second way has a permutation matrix P, in the middle. 


1. The row exchanges can be done in advance. Their product P puts the rows of A in 
the right order, so that no exchanges are needed for PA. Then PA = LU. 


2. If we hold row exchanges until after elimination, the pivot rows are in a strange order. 
P; puts them in the correct triangular order in U1. Then A = Lı PU. 


PA = LU is constantly used in all computing. We will concentrate on this form. 


The factorization A = Lı P U, might be more elegant. If we mention both, it is because 
the difference is not well known. Probably you will not spend a long time on either one. 
Please don’t. The most important case has P = I, when A equals L U with no exchanges. 


This matrix A starts with a,; = 0. Exchange rows 1 and 2 to bring the first pivot into 
its usual place. Then go through elimination on PA: 


O © 1 | eae I i ee ee 
t 2 I= 0 1 1j> 0 1 H> 0 1 1 
2 ame, 2 7 9 0 3 7 0 0 4 

A PA l3, =2 i345 = 3 


The matrix PA has its rows in good order, and it factors as usual into L U: 


0 1 0 Le Oy |e 2 oe 
P=)1 0 0 PA=|0 1 0| (Oe: (7) 
0 0 1 22. | bo 4 
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We started with A and ended with U. The only requirement is invertibility of A. 


If A is invertible, a permutation P will put its rows in the right order to factor PA = DU 
There must be a full set of pivots after row exchanges for A to be invertible. 


In MATLAB, A ([r k],:) = A ([k r], :) exchanges row k with row r below it (where the 
kth pivot has been found). Then the lu code updates L and P and the sign of P: 


A([r k],:) = A (lk r], :); 
This is part of LD((r k R= 1) = Berd 2k), 
[L,U,P)=W(A) P (fr k],:) = P([er),2)3 

sign = —sign 


The “sign” of P tells whether the number of row exchanges is even (sign = +1). 
An odd number of row exchanges will produce sign = —1. At the start, P is J and sign 
= +1. When there is a row exchange, the sign is reversed. The final value of sign is the 
determinant of P and it does not depend on the order of the row exchanges. 

For PA we get back to the familiar LU. In reality, a code like lu (A) often does not 
use the first available pivot. Mathematically we can accept a small pivot— anything but 
zero. All good codes look down the column for the largest pivot. 

Section 11.1 explains why this “partial pivoting” reduces the roundoff error. Then 
P may contain row exchanges that are not algebraically necessary. Still PA = LU. 

Our advice is to understand permutations but let the computer do the work. Calculations 
of A = LU are enough to do by hand, without P. The Teaching Code splu (A) factors 
PA = LU and splv (A, b) solves Ax = b for any invertible A. The program splu on the 
website stops if no pivot can be found in column k. Then A is not invertible. 


= REVIEW OF THE KEY IDEAS =" 


1. The transpose puts the rows of A into the columns of AT. Then (A™);; = Aji. 

2. The transpose of AB is B‘ A’. The transpose of A~! is the inverse of AT. 

3. The dot product is x - y = x! y. Then (Ax)! y equals the dot product xT (AT y). 
4. When S is symmetric (ST = S), its LDU factorization is symmetric: S = LDLT. 
5. A permutation matrix P has a 1 in each row and column, and PT = Pt. 

6. There are n! permutation matrices of size n. Half even, half odd. 


7. If A is invertible then a permutation P will reorder its rows for PA = LU. 
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™ WORKED EXAMPLES 8 


2.7A Applying the permutation P to the rows of S' destroys its symmetry : 
0 1 0 1 4 5 4 2 6 
PENOSO S=|4 2 6 PS=i15 6 3 
1a 0 8 5 6 3 1 4 5 


What permutation Q applied to the columns of PS will recover symmetry in PSQ? 
The numbers 1, 2,3 must come back to the main diagonal (not necessarily in order). 
Show that Q is PT, so that symmetry is saved by PS PT. 


Solution To recover symmetry and put “2” back on the diagonal, column 2 of PS 
must move to column 1. Column 3 of PS (containing “3”) must move to column 2. 
Then the “1” moves to the 3, 3 position. The matrix that permutes columns is Q: 


4 2 6 0 0 1 2 6 4 
Pe= io 6 3 Q=|1 0 0 PSQ=|6 3 5| is symmetric. 
14 5 0 1 0 4 5 1 


The matrix Q is PT. This choice always recovers symmetry, because PS PT is guaranteed 
to be symmetric. (Its transpose is again PSP?.) The matrix Q is also P~', because the 
inverse of every permutation matrix is its transpose. 

If D is a diagonal matrix, we are finding that PDP? is also diagonal. When P moves 
row 1 down to row 3, PT on the right will move column 1 to column 3. The (1, 1) entry 
moves down to (3, 1) and over to (3, 3). 


2.7B Find the symmetric factorization S = LDLT for the matrix S above. 


Solution To factor S into LDLT we eliminate as usual to reach U : 


1 4 5 1 4 5 1 4 5 
S=|4 2 6| — |0 -14 -14| —» |0 -14 -14| =U. 
5 6 3 0 —14 =22 0 0 -8 


The multipliers were £2; = 4 and £31 = 5 and £32 = 1. The pivots 1, —14, —8 go into D. 
When we divide the rows of U by those pivots, LT should appear: 


Symmetric 1 0 0j |1 1 4 5 
factorization S=LDLT=|4 1 0 —14 0 1 1 
when S = ST 5 1 1 -8| IO 0 1 


This matrix S is invertible because it has three pivots. Its inverse is (L1)~!D~!L~1 and 
S~1 is also symmetric. The numbers 14 and 8 will turn up in the denominators of S~?. 
The “determinant” of S is the product of the pivots (1)(—14)(—8) = 112. 
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2.7 C For a rectangular A, this saddle-point matrix S is symmetric and important: 


Block matrix s= I A 
from least squares ~ |AT 0 


| = ST has size m +n. 


Apply block elimination to find a block factorization S = LDLT. Then test invertibility: 


S is invertible <=» ATAis invertible <= Ax 4 0 whenever x + 0 


Solution The first block pivot is J. Subtract AT times row 1 from row 2: 


eT ee I A X 
Block elimination S = At il goes to f 40 Ale This is U. 


The block pivot matrix D contains J and — AT A. Then L and LT contain AT and A: 


| lo 4]; 


L is certainly invertible, with diagonal 1’s. The inverse of the middle matrix involves 
(AT A)~!. Section 4.2 answers a key question about the matrix AT A: 


I 0 


Block factorization S = LDLT = | AT H i : 


0 —ATA 


When is AT A invertible? Answer: A must have independent columns. 
Then Ax =0 only if x =0. Otherwise Ax = 0 will lead to AT Ax =0. 


Problem Set 2.7 
Questions 1-7 are about the rules for transpose matrices. 


1 Find AT and A`! and (A~)? and (AT)~? for 


1 0 le 
A=) | and also A=[} r 


2 Verify that (AB)T equals BT AT but those are different from AT BT : 


aE] epi p 


Show also that AAT is different from AT A. But both of those matrices are ; 


3 (a) The matrix ((AB)~1)? comes from (A~!)? and (B~!)?. In what order? 
(b) If U is upper triangular then (U—')? is triangular. 


4 Show that A? = 0 is possible but AT A = 0 is not possible (unless A = zero matrix). 
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(a) The row vector xT times A times the column y produces what number? 


0 
z ti 9 
a’ Ay = [0 aN 5 6 ls 
0 
(b) This is the row z TA = times the column y = (0, 1,0). 


(c) This is the row x? = [0_1] times the column Ay = 


The transpose of a block matrix M = k: BJ is MT = . Test an example. 
Under what conditions on A, B, C, D is the block matrix symmetric ? 


True or false: 


(a) The block matrix ee a is automatically symmetric. 
(b) If A and B are symmetric then their product AB is symmetric. 
(c) If A is not symmetric then AT! is not symmetric. 


(d) When A, B,C are symmetric, the transpose of ABC is CBA. 


Questions 8-15 are about permutation matrices. 


8 


9 


10 


11 


12 


13 


14 


Why are there n! permutation matrices of order n? 


If P, and P are permutation matrices, so is Pı P2. This still has the rows of J in 
some order. Give examples with P; P> # PaP, and P P4 = P,P3. 


There are 12 “even” permutations of (1, 2, 3,4), with an even number of exchanges. 
Two of them are (1, 2,3, 4) with no exchanges and (4,3, 2,1) with two exchanges. 
List the other ten. Instead of writing each 4 by 4 matrix, just order the numbers. 


Which permutation makes PA upper triangular? Which permutations make P; A Pz 
lower triangular? Multiplying A on the right by P> exchanges the of A. 


0 0 6 
A=]|1 2 3 
045 


Explain why the dot product of x and y equals the dot product of Px and Py. 
Then (Px)'(Py) = xy tells us that PTP = I for any permutation. With 
x = (1,2,3) and y = (1,4, 2) choose P to show that Pz - y is not always æ - Py. 


(a) Find a 3 by 3 permutation matrix with P? = T (but not P = J). 
(b) Find a 4 by 4 permutation P with P4 £ I. 


If P has 1’s on the antidiagonal from (1, n) to (n, 1), describe PAP. Note P = PT. 


: 
a 
| 
- 
| 
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15 All row exchange matrices are symmetric: PT = P. Then PTP = JI becomes 
P? = J. Other permutation matrices may or may not be symmetric. 


(a) If P sends row 1 to row 4, then PT sends row to row _—— 
When PT = P the row exchanges come in pairs with no overlap. 


(b) Find a 4 by 4 example with PT = P that moves all four rows. 
Questions 16-21 are about symmetric matrices and their factorizations. 


16 IfA = AT and B = BT, which of these matrices are certainly symmetric? 


(a) A? — B? (b) (A+ B)\(A — B) (c) ABA (d) ABAB. 


17 Find 2 by 2 symmetric matrices S$ = ST with these properties: 


(a) S is not invertible. i 


(b) S is invertible but cannot be factored into L U (row exchanges needed). 


(c) S can be factored into LDLT but not into LLT (because of negative D). 


18 (a) How many entries of S can be chosen independently, if S = ST is 5 by 5? 
(b) How do L and D (still 5 by 5) give the same number of choices in LDLT ? 


(c) How many entries can be chosen if A is skew-symmetric? (AT = —A). 


19 Suppose A is rectangular (m by n) and S' is symmetric (m by m). 


(a) Transpose ATSA to show its symmetry. What shape is this matrix? 


(b) Show why ATA has no negative numbers on its diagonal. 


20 Factor these symmetric matrices into S = LDLT. The pivot matrix D is diagonal: 


Z 10 i 
= A and il ] and S=j|-l 2 —1 
i i =1 3 


21 = After elimination clears out column 1 below the first pivot, find the symmetric 2 by 
2 matrix that appears in the lower right corner: i 


2 4 8 1 b 
Start from S = |4 3 9 and S= |b d 
8 9 0 C e 
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Questions 22-24 are about the factorizations PA = L U and A = Lı PU. 


22 ~=Find the PA = LU factorizations (and check them) for 


0 1 1 1 2 0 
As |i 0 1 and A=|2 4 1 
2 3 4 1 1 1 


23 Finda 4 by 4 permutation matrix (call it A) that needs 3 row exchanges to reach the 
end of elimination. For this matrix, what are its factors P, L, and U? 


24 Factor the following matrix into PA = LU. Factor it also into A = Lı P Uı 
(hold the exchange of row 3 until 3 times row 1 is subtracted from row 2): 


0 1 2 
A=]|0 3 8 
2 1 1 
25 Prove that the identity matrix cannot be the product of three row exchanges (or five). 


It can be the product of two exchanges (or four). 


26 (a) Choose Es; to remove the 3 below the first pivot. Then multiply E215 BA to 
remove both 3’s: 


1 3 0 1 0 0 
S=/3 ll 4 is going toward D= |O 2 0 
0 4 9 00 1 
(b) Choose E32 to remove the 4 below the second pivot. Then S is reduced to D 


by E32 E2 SES, Eas = D. Invert the E’s to find Lin S = LDL’. 


27 ~—s If every row of a 4 by 4 matrix contains the numbers 0, 1, 2, 3 in some order, can the 
matrix be symmetric? 


28 Prove that no reordering of rows and reordering of columns can transpose a typical 
matrix. (Watch the diagonal entries.) 


The next three questions are about applications of the identity (Axz)Ty = zT (ATy). 


29 Wires go between Boston, Chicago, and Seattle. Those cities are at voltages xB, £C, | 
xg. With unit resistances between cities, the currents between cities are in y: 


YBC 1 -l 0 | TB 
y= Az is yos| =|0 1 —1 ro 
YBS 1 0 Sh) |zs 


(a) Find the total currents ATy out of the three cities. 
(b) Verify that (Ax)Ty agrees with xT (ATy)—six terms in both. 
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30 


31 


32 


33 


34 


35 


36 


37 


38 


39 


Producing x; trucks and x2 planes needs x; + 50x2 tons of steel, 407; + 1000z2 
pounds of rubber, and 27; + 50x2 months of labor. If the unit costs y1, y2, y3 are 
$700 per ton, $3 per pound, and $3000 per month, what are the values of one truck 
and one plane? Those are the components of AT y. 


Aa gives the amounts of steel, rubber, and labor to produce æ in Problem 31. Find A. 
Then Az - y is the of inputs while x - AT y is the value of 


The matrix P that multiplies (x,y,z) to give (z,z,y) is also a rotation matrix. 
Find P and P. The rotation axis a = (1,1,1) doesn’t move, it equals Pa. 
What is the angle of rotation from v = (2,3, —5) to Pv = (—5, 2,3)? 


Write A = k A as the product Æ S of an elementary row operation matrix E and a 
symmetric matrix S. 


Here is a new factorization of A into LS: triangular (with 1’s) times symmetric : 
Start from A = LDU. Then A equals L (U?)~' times S = UT DU. 
Why is L (UT)? triangular? Its diagonal is all 1’s. Why is UT DU symmetric? 


A group of matrices includes AB and A`! if it includes A and B. “Products and 
inverses stay in the group.” Which of these sets are groups? 
Lower triangular matrices L with 1’s on the diagonal, symmetric matrices S, 
positive matrices M, diagonal invertible matrices D, permutation matrices P, 
matrices with QT = Q71. Invent two more matrix groups. 


Challenge Problems 


A square northwest matrix B is zero in the southeast corner, below the antidiagonal 
that connects (1, n) to (n, 1). Will BT and B? be northwest matrices? Will B7} be 
northwest or southeast? What is the shape of BC = northwest times southeast? 
If you take powers of a permutation matrix, why is some P* eventually equal to I? 
Find a 5 by 5 permutation P so that the smallest power to equal J is P®. 

(a) Write down any 3 by 3 matrix M. Split M into S + A where S = ST is 

symmetric and A = —A? is anti-symmetric. 
(b) Find formulas for S and A involving M and MT. We want M = S + A. 


Suppose QT equals Q71 (transpose equals inverse, so QTQ = T). 


(a) Show that the columns q1, . . . , qn are unit vectors: ||q;||? = 1. 
(b) Show that every two columns of Q are perpendicular: q? q> = 0. 


(c) Find a 2 by 2 example with first entry qi1 = cos 0. 
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The Transpose of a Derivative 


Will you allow me a little calculus? It is extremely important or I wouldn’t leave 
linear algebra. (This is really linear algebra for functions x(t).) The matrix changes 
to a derivative so A = d/dt. To find the transpose of this unusual A we need to define 
the inner product between two functions x(t) and y(t). 

The inner product changes from the sum of xz yx to the integral of x(t) y(t). 


Inner product 
of functions aty = (2,y) = J x(t) y(t) dt 


From this inner product we know the requirement on AT. The word “adjoint” is more 
correct than “transpose” when we are working with derivatives. 


oe d 
The transpose of a matrix has (Ax)Ty = xT (ATy). The adjoint of A = iE has 


ra dx T dy 
Az,y)= | — y(t) dt = t)( -— ]dt = 
(zy) = | Foou fae (-Y) = aat) 
I hope you recognize integration by parts. The derivative moves from the first 


function z(t) to the second function y(t). During that move, a minus sign appears. 
This tells us that the transpose of the derivative is minus the derivative. 


The derivative is antisymmetric: A = d/dt and AT = —d/dt. Symmetric matrices 
have ST = S, antisymmetric matrices have AT = —A. Linear algebra includes derivatives 
and integrals in Chapter 8, because those are both linear. 


This antisymmetry of the derivative applies also to centered difference matrices. 


0 1 0 0 0—1 0 0 
o Jļ|—-i1 0 ap pr | il =L 0f 
A= 0 -1 0 1 transposes to A- = 0 1 0 1 =—A, 
0 0 -1i 0 Ov i 0 


And a forward difference matrix transposes to a backward difference matrix, multiplied 
by —1. In differential equations, the second derivative (acceleration) is symmetric. The 
first derivative (damping proportional to velocity) is antisymmetric. 


Chapter 3 


Vector Spaces and Subspaces 


3.1 Spaces of Vectors 


1 The standard n-dimensional space R” contains all real column vectors with n components. 
2 Ifv and w are in a vector space S, every combination cv + dw must be in S. 


3 The “vectors” in S can be matrices or functions of x. The 1-point space Z consists of x = 0. 


4 A subspace of R” is a vector space inside R”. Example: The line y = 3z inside R°. 


5 The column space of A contains all combinations of the columns of A: a subspace of R”. 


6 The column space contains all the vectors Ax. So Ax = bis solvable when b is in C(A). 


To a newcomer, matrix calculations involve a lot of numbers. To you, they involve 
vectors. The columns of Ax and AB are linear combinations of n vectors—the columns 
of A. This chapter moves from numbers and vectors to a third level of understanding (the 
highest level). Instead of individual columns, we look at “spaces” of vectors. Without 
seeing vector spaces and especially their subspaces, you haven’t understood everything 
about Ax = b. 

Since this chapter goes a little deeper, it may seem a little harder. That is natural. We 
are looking inside the calculations, to find the mathematics. The author’s job is to make it 
clear. The chapter ends with the “Fundamental Theorem of Linear Algebra”. 

We begin with the most important vector spaces. They are denoted by R', R?, R?, 
R‘4,.... Each space R” consists of a whole collection of vectors. RŠ contains all column 
vectors with five components. This is called “5-dimensional space”. 


DEFINITION The space R” consists of all column vectors v with n components. 
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The components of v are real numbers, which is the reason for the letter R. A vector whose 
n components are complex numbers lies in the space C”. 


The vector space R? is represented by the usual zy plane. Each vector v in R? has two 
components. The word “space” asks us to think of all those vectors—the whole plane. 
Each vector gives the x and y coordinates of a point in the plane: v = (x, y). 

Similarly the vectors in R correspond to points (x, y, z) in three-dimensional space. 
The one-dimensional space R! is a line (like the x axis). As before, we print vectors as a 
column between brackets, or along a line using commas and parentheses: 


H isin RÊ, (1,1,0,1,1)isin RŠ, i j i BuG. 
T Iz 
The great thing about linear algebra is that it deals easily with five-dimensional space. 
We don’t draw the vectors, we just need the five numbers (or n numbers). 

To multiply v by 7, multiply every component by 7. Here 7 is a “scalar”. To add vectors 
in R5, add them a component at a time. The two essential vector operations go on inside 
the vector space, and they produce linear combinations: 


We can add any vectors in R”, and we can multiply any vector v by any scalar c. 


“Inside the vector space” means that the result stays in the space. If v is the vector in R* 
with components 1, 0, 0, 1, then 2v is the vector in R* with components 2, 0, 0, 2. (In this 
case 2 is the scalar.) A whole series of properties can be verified in R”. The commutative 
law is v + w = w + v; the distributive law is c(v + w) = cv + cw. There is a unique 
“zero vector” satisfying 0 + v = v. Those are three of the eight conditions listed at the 
start of the problem set. 

These eight conditions are required of every vector space. There are vectors other than 
column vectors, and there are vector spaces other than R”, and all vector spaces have to 
obey the eight reasonable rules. 

A real vector space is a set of “vectors” together with rules for vector addition and for 
multiplication by real numbers. The addition and the multiplication must produce vectors 
that are in the space. And the eight conditions must be satisfied (which is usually no 
problem). Here are three vector spaces other than R”: 


M The vector space of all real 2 by 2 matrices. 
F The vector space of all real functions f(x). 
Z The vector space that consists only of a zero vector. 


In M the “vectors” are really matrices. In F the vectors are functions. In Z the only addition 
is 0 + O = O. In each case we can add: matrices to matrices, functions to functions, zero 
vector to zero vector. We can multiply a matrix by 4 or a function by 4 or the zero vector 
by 4. The result is still in M or F or Z. The eight conditions are all easily checked. 


The function space F is infinite-dimensional. A smaller function space is P, or Pp, 
containing all polynomials ag + ax +--+ + anz” of degree n. 
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The space Z is zero-dimensional (by any reasonable definition of dimension). Z is the 
smallest possible vector space. We hesitate to call it R°, which means no components— 
you might think there was no vector. The vector space Z contains exactly one vector (zero). 
No space can do without that zero vector. Each space has its own zero vector—the zero 
matrix, the zero function, the vector (0, 0,0) in R3. 


f 4 = typical vector in M 
cd 


N ba | 


smallest vector space 
zero vector only 


O = 
oe © 
oO © 
oO, 


Figure 3.1: “Four-dimensional” matrix space M. The “zero-dimensional” space Z. 


Subspaces 


At different times, we will ask you to think of matrices and functions as vectors. But at all 
times, the vectors that we need most are ordinary column vectors. They are vectors with 
n components—but maybe not all of the vectors with n components. There are important 
vector spaces inside R”. Those are subspaces of R”. 

Start with the usual three-dimensional space R3. Choose a plane through the origin 
(0,0, 0). That plane is a vector space in its own right. If we add two vectors in the plane, 
their sum is in the plane. If we multiply an in-plane vector by 2 or —5, it is still in the plane. 
A plane in three-dimensional space is not R? (even if it looks like R*). The vectors have 
three components and they belong to Rè. The plane is a vector space inside R°. 

This illustrates one of the most fundamental ideas in linear algebra. The plane going 
through (0, 0, 0) is a subspace of the full vector space R. 


DEFINITION A subspace of a vector space is a set of vectors (including 0) that satisfies 
two requirements: If vand w are vectors in the subspace and c is any scalar, then 


(i) v + wis in the subspace (ii) cv is in the subspace. 
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In other words, the set of vectors is “closed” under addition v + w and multiplication cv 
(and dw). Those operations leave us in the subspace. We can also subtract, because —w is 
in the subspace and its sum with v is v — w. In short, all linear combinations stay in the 
subspace. 

All these operations follow the rules of the host space, so the eight required conditions 
are automatic. We just have to check the linear combinations requirement for a subspace. 


First fact: Every subspace contains the zero vector. The plane in R? has to go through 
(0, 0,0). We mention this separately, for extra emphasis, but it follows directly from rule (ii). 
Choose c = 0, and the rule requires Ov to be in the subspace. 

Planes that don’t contain the origin fail those tests. Those planes are not subspaces. 

Lines through the origin are also subspaces. When we multiply by 5, or add two 
vectors on the line, we stay on the line. But the line must go through (0, 0, 0). 

Another subspace is all of R?. The whole space is a subspace (of itself). Here is a list 
of all the possible subspaces of R3: 


(L) Any line through (0, 0, 0) (R3) The whole space 


(P) Any plane through (0, 0, 0) (Z) The single vector (0,0, 0) 


If we try to keep only part of a plane or line, the requirements for a subspace don’t 
hold. Look at these examples in R?—they are not subspaces. 


Example 1 Keep only the vectors (x, y) whose components are positive or zero (this is 
a quarter-plane). The vector (2,3) is included but (—2, —3) is not. So rule (ii) is violated 
when we try to multiply by c = —1. The quarter-plane is not a subspace. 


Example 2 Include also the vectors whose components are both negative. Now we have 
two quarter-planes. Requirement (ii) is satisfied; we can multiply by any c. But rule (i) 
now fails. The sum of v = (2,3) and w = (—3,—2) is (—1,1), which is outside the 
quarter-planes. Two quarter-planes don’t make a subspace. 

Rules (i) and (ii) involve vector addition v + w and multiplication by scalars c and d. 
The rules can be combined into a single requirement—the rule for subspaces: 


A subspace containing v and w must contain all linear combinations cv + dw. 


Example 3 Inside the vector space M of all 2 by 2 matrices, here are two subspaces: 


b 


"| (D) All diagonal matrices É al 


(U) All upper triangular matrices k 0d 


Add any two matrices in U, and the sum is in U. Add diagonal matrices, and the sum is 
diagonal. In this case D is also a subspace of U! Of course the zero matrix is in these 
subspaces, when a, b, and d all equal zero. Z is always a subspace. 

Multiples of the identity matrix also form a subspace. 27 + 3J is in this subspace, and 
so is 3 times 47. The matrices c7 form a “line of matrices” inside M and U and D. 
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Is the matrix I a subspace by itself? Certainly not. Only the zero matrix is. Your mind 
will invent more subspaces of 2 by 2 matrices—write them down for Problem 5. 


The Column Space of A 


The most important subspaces are tied directly to a matrix A. We are trying to solve 
Aa = b. If A is not invertible, the system is solvable for some b and not solvable for other 
b. We want to describe the good right sides b—the vectors that can be written as A times 
some vector x. Those b's form the “column space” of A. 

Remember that Ax is a combination of the columns of A. To get every possible b, we 
use every possible x. Start with the columns of A and take all their linear combinations. 
This produces the column space of A. It is a vector space made up of column vectors. 


C(A) contains not just the n columns of A, but all their combinations Az. 


DEFINITION The column space consists of all linear combinations of the columns. 
The combinations are all possible vectors Aw. They fill the column space C(A). 


This column space is crucial to the whole book, and here is why. To solve Ax = bis to 
express b as a combination of the columns. The right side b has to be in the column space 
produced by A on the left side, or no solution! 


The system Ax = bis solvable if and only if b is in the column space of A. 


When b is in the column space, it is a combination of the columns. The coefficients in 
that combination give us a solution æ to the system Az = b. 

Suppose A is an m by n matrix. Its columns have m components (not n). So the 
columns belong to R™. The column space of A is a subspace of R™ (not R” ). The set 
of all column combinations Az satisfies rules (i) and (ii) for a subspace: When we add 
linear combinations or multiply by scalars, we still produce combinations of the columns. 
The word “subspace” is justified by taking all linear combinations. 

Here is a 3 by 2 matrix A, whose column space is a subspace of R°. The column space 
of A is a plane in Figure 3.2. With only 2 columns, C (A) can’t be all of Re 


Example 4 
A 1 0 
Axr is |4 3 | whichis 2, |4| +29 |3 
ee eae 2 3 


The column space of all combinations of the two columns fills up a plane in R°. 
We drew one particular b (a combination of the columns). This b = Az lies on the plane. 
The plane has zero thickness, so most right sides b in R3 are not in the column space. For 
most 6 there is no solution to our 3 equations in 2 unknowns. 
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Ax =b has x= 


A 
w e 
L 


Plane = C(A) = all vectors Ax 


Figure 3.2: The column space C(A) is a plane containing the two columns. Ax = b is 
solvable when b is on that plane. Then b is a combination of the columns. 


Of course (0,0,0) is in the column space. The plane passes through the ie There 
is certainly a solution to Aw = 0. That solution, always available, is x = 

To repeat, the attainable right sides b are exactly the vectors in the column space. One 
possibility is the first column itself—take zı = 1 and z2 = 0. Another combination is the 
second column—take zı = 0 and z2 = 1. The new level of understanding is to see all 
combinations—the whole subspace is generated by those two columns. 


Notation The column space of A is denoted by C(A). Start with the columns and take all 
their linear combinations. We might get the whole R™ or only a subspace. 


Important Instead of columns in R”, we could start with any set S of vectors in a vector 
space V. To get a subspace SS of V, we take all combinations of the vectors in that set: 


S = _ set of vectors in V (probably not a subspace) 
SS = all combinations of vectors in S (definitely a subspace) 
SS = allcjvi+-:-+cnvy = thesubspace of V “spanned” by S 


When S is the set of columns, SS is the column space. When there is only one nonzero 

vector v in S, the subspace SS is the line through v. Always SS is the smallest subspace 

containing S. This is a fundamental way to create subspaces and we will come back to it. 
To repeat: The columns “span” the column space. 


The subspace SS is the “span” of S, containing all combinations of vectors in S. 
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Example 5 Describe the column spaces (they are subspaces of R°) for 


1 0 12 1 2-3 
r=| | and il; i and B= |) 0 AF 


Solution The column space of I is the whole space R°. Every vector is a combination of 
the columns of J. In vector space language, C (T) is R?. 

The column space of A is only a line. The second column (2, 4) is a multiple of the first 
column (1,2). Those vectors are different, but our eye is on vector spaces. The column 
space contains (1, 2) and (2,4) and all other vectors (c, 2c) along that line. The equation 
Az = bis only solvable when b is on the line. 

For the third matrix (with three columns) the column space C'(B) is all of R°. Every b 
is attainable. The vector b = (5,4) is column 2 plus column 3, so æ can be (0,1, 1). The 
same vector (5, 4) is also 2(column 1) + column 3, so another possible x is (2,0, 1). This 
matrix has the same column space as J—any 6 is allowed. But now z has extra components 
and there are more solutions—more combinations that give b. 


The next section creates a vector space N (A), to describe all the solutions of Ax = 0. 
This section created the column space C(A), to describe all the attainable right sides b. 


= REVIEW OF THE KEY IDEAS = 


1. R” contains all column vectors with n real components. 
2. M (2 by 2 matrices) and F (functions) and Z (zero vector alone) are vector spaces. 
3. A subspace containing v and w must contain all their combinations cv + dw. 


4. The combinations of the columns of A form the column space C(A). Then the 
column space is “spanned” by the columns. 


5. Ax = b has a solution exactly when b is in the column space of A. 


C(A) = all combinations of the columns = all vectors Az. 


™ WORKED EXAMPLES = 


3.1 A We are given three different vectors b,,b2,63. Construct a matrix so that the 
equations Ax = bı and Ax = bə are solvable, but Ax = bz is not solvable. How can you 
decide if this is possible? How could you construct A? 
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Solution We want to have bı and bə in the column space of A. Then Aw = bı and 
Aa = bəz will be solvable. The quickest way is to make bı and bə the two columns of A. 
Then the solutions are x = (1,0) and x = (0, 1). 

Also, we don’t want Ax = bg to be solvable. So don’t make the column space any 
larger! Keeping only the columns b; and b2, the question is: 


Is Aw = | by be | i | = bz solvable? Is b3 a combination of bı and b2? 
2 


If the answer is no, we have the desired matrix A. If the answer is yes, then it is not possible 
to construct A. When the column space contains b; and bg, it will have to contain all their 
linear combinations. So 63 would necessarily be in that column space and Ax = b3 would 
necessarily be solvable. 


3.1B Describe a subspace S of each vector space V, and then a subspace SS of S. 


Vı = all combinations of (1,1, 0,0) and (1,1, 1,0) and (1, 1,1, 1) 
V2 = all vectors perpendicular to u = (1,2,1), so u-v =0 

V3 = all symmetric 2 by 2 matrices (a subspace of M) 

V, = all solutions to the equation d*+y/dx* = 0 (a subspace of F) 


b 


Describe each V two ways: “All combinations of...” “All solutions of the equations...” 


Solution Vj starts with three vectors. A subspace S comes from all combinations of the 
first two vectors (1,1,0,0) and (1,1,1,0). A subspace SS of S comes from all multiples 
(c, c, 0,0) of the first vector. So many possibilities. 

A subspace S of V% is the line through (1, —1, 1). This line is perpendicular to u. The 
vector x = (0,0, 0) is in S and all its multiples ca give the smallest subspace SS = Z. 

The diagonal matrices are a subspace S of the symmetric matrices. The multiples cl 
are a subspace SS of the diagonal matrices. 

V, contains all cubic polynomials y = a + bx + cx? + dx®, with d*y/dx* = 0. The 
quadratic polynomials give a subspace S. The linear polynomials are one choice of SS. 
The constants could be SSS. 

In all four parts we could take S = V itself, and SS = the zero subspace Z. 

Each V can be described as all combinations of .... and as all solutions of ....: 


Vı = all combinations of the 3 vectors Vı = all solutions of vı — vo = 0 
V2 = all combinations of (1,0, —1) and (1, —1,1) V2 = all solutions of u - v = 0. 
V3 = all combinations of [4 9], [94], [89]. V3 = all solutions [28] ofb=c 


V4 = all combinations of 1, x, x”, x3 V4 = all solutions to d*y/dzx* = 0. 
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Problem Set 3.1 


The first problems 1-8 are about vector spaces in general. The vectors in those spaces 
are not necessarily column vectors. In the definition of a vector space, vector addition 
x + y and scalar multiplication cx must obey the following eight rules: 


1) xz+y=y+rzr 

(2)a+(yt+z)=(@ty)+z 

(3) There is a unique “zero vector” such that x + 0 = z forall x 
(4) For each a there is a unique vector —æ such that x + (—ax) = 0 


(5) 1 times x equals x 


(6) (c1c2)£ = c1 (Coa) (1) to (4) about x + y 
(7) c(£ + y) = c£ + cy (5) to (6) about cx 
(8) (c1 + c2)£ = cya + Com. (7) to (8) connects them 


1 Suppose (x1, £2) + (y1, y2) is defined to be (xı + Y2, £2 + y1). With the usual 
multiplication cx = (cx, cx2), which of the eight conditions are not satisfied? 


2 Suppose the multiplication cæ is defined to produce (cx, 0) instead of (cx1, cre). 
With the usual addition in R”, are the ei ght conditions satisfied? 


3 (a) Which rules are broken if we keep only the positive numbers x > 0 in Rt? 
Every c must be allowed. The half-line is not a subspace. 


(b) The positive numbers with x + y and cx redefined to equal the usual xy and 
x° do satisfy the eight rules. Test rule 7 when c = 3,2 = 2,y = 1. (Then 
x + y = 2 and ca = 8.) Which number acts as the “zero vector”? 


4 The matrix A = F a is a “vector” in the space M of all 2 by 2 matrices. Write 


down the zero vector in this space, the vector 5A, and the vector —A. What matrices 
are in the smallest subspace containing A? 


5 (a) Describe a subspace of M that contains A = [§ 8] but not B= [6 _9]. 
(b) If a subspace of M does contain A and B, must it contain I? 
(c) Describe a subspace of M that contains no nonzero diagonal matrices. 
6 The functions f(x) = x? and g(x) = 5z are “vectors” in F. This is the vector 


space of all real functions. (The functions are defined for —co < x < oo.) The 
combination 3f (x) — 4g(z) is the function h(x) = 
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7 Which rule is broken if multiplying f(x) by c gives the function f(cr)? Keep the 
usual addition f (x) + g(x). 


8 If the sum of the “vectors” f(x) and g(x) is defined to be the function f(g(x)), then 
the “zero vector” is g(x) = x. Keep the usual scalar multiplication cf (x) and find 
two rules that are broken. 


Questions 9-18 are about the “subspace requirements”: x + y and cz (and then all 
linear combinations cx + dy) stay in the subspace. 


9 One requirement can be met while the other fails. Show this by finding 


(a) A set of vectors in R? for which x + y stays in the set but ix may be outside. 
(b) A set of vectors in R? (other than two quarter-planes) for which every cæ stays 
in the set but x + y may be outside. 
10 Which of the following subsets of R? are actually subspaces ? 


(a) The plane of vectors (bj, b2, b3) with bı = be. 

(b) The plane of vectors with bı = 1. 

(c) The vectors with b1b2b3 = 0. 

(d) All linear combinations of v = (1, 4,0) and w = (2, 2, 2). 
(e) All vectors that satisfy bı + b2 + b3 = 0. 

(f) All vectors with b1 < bo < bs. 


11 Describe the smallest subspace of the matrix space M that contains 


1 0 0 1 1 1 1 0 1 0 
(a) lo o mg | (b) i J ©) f J a E ip 
12 Let P be the plane in R? with equation x + y — 2z = 4. The origin (0, 0, 0) is not in 


P! Find two vectors in P and check that their sum is not in P. 


13 Let Po be the plane through (0, 0,0) parallel to the previous plane P. What is the 
equation for Po? Find two vectors in Po and check that their sum is in Po. 


14 The subspaces of RÌ are planes, lines, R? itself, or Z containing only (0, 0,0). 


(a) Describe the three types of subspaces of R°. 
(b) Describe all subspaces of D, the space of 2 by 2 diagonal matrices. 
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15 (a) The intersection of two planes through (0, 0, 0) is probably a in R? but 
it could be a . It can’t be Z! 
(b) The intersection of a plane through (0, 0, 0) with a line through (0, 0, 0) is prob- 
ably a butitcouldbea _—_— 


(c) If S and T are subspaces of R°, prove that their intersection S N T is a 
subspace of R°. Here S N T consists of the vectors that lie in both subspaces. 
Check that x + y and cz are in SN T if x and y are in both spaces. 


16 Suppose P is a plane through (0, 0,0) and L is a line through (0, 0, 0). The smallest 
vector space containing both P and L is either or 


17 (a) Show that the set of invertible matrices in M is not a subspace. 


(b) Show that the set of singular matrices in M is not a subspace. 
18 True or false (check addition in each case by an example): 


(a) The symmetric matrices in M (with AT = A) form a subspace. 
(b) The skew-symmetric matrices in M (with AT = —A) form a subspace. 
(c) The unsymmetric matrices in M (with AT Æ A) forma subspace. 


Questions 19-27 are about column spaces C (A) and the equation Ax = b. 


19 Describe the column spaces (lines or planes) of these particular matrices: 


1 2 1 0 1 0 
A=]|0 0 and B= |0 2 and C=]|2 0 
0 0 0 0 0 0 


20 For which right sides (find a condition on b1, bz, b3) are these systems solvable? 


1 4 2) [a by l1 4 m by 
(az) | 2 8 4] ja] = | bo (b) 2 9 z = | bo 
th 2A 29) | aes b3 a ee z bs 


21 Adding row 1 of A to row 2 produces B. Adding column 1 to column 2 produces Č. 
A combination of the columns of (B or C ?) is also a combination of the columns of 


A. Which two matrices have the same column ? 
1 2 1 2 1 3 
A= E and B= E J and C= ; l ! 
22 For which vectors (by, bz, b3) do these systems have a solution? 

111] fz by 1 ae a |z by 
0 1 1 T2 | = bə and 0 1 i1 T2 | = bə 
0 0 1 T3 b3 0 0 0 T3 bs 

1 1 1 £i by 

and 0 0 1 zə | = | be 

0 0 1 T3 bs 
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(Recommended) If we add an extra column b to a matrix A, then the column space 
gets larger unless . Give an example where the column space gets larger and 
an example where it doesn’t. Why is Aa = b solvable exactly when the column 
space doesn’t get larger—it is the same for A and [A b | ? 


The columns of AB are combinations of the columns of A. This means: The column 
space of AB is contained in (possibly equal to) the column space of A. Give an 
example where the column spaces of A and AB are not equal. 


Suppose Ax = band Ay = b* are both solvable. Then Az = b + b“ is solvable. 
What is z? This translates into: If b and b* are in the column space C(A), then 
b + b* is in C(A). 


If A is any 5 by 5 invertible matrix, then its column space is . Why? 
True or false (with a counterexample if false): 


(a) The vectors b that are not in the column space C(A) form a subspace. 
(b) If C(A) contains only the zero vector, then A is the zero matrix. 

(c) The column space of 2A equals the column space of A. 

(d) The column space of A — I equals the column space of A (test this). 


Construct a 3 by 3 matrix whose column space contains (1, 1, 0) and (1,0, 1) but not 
(1, 1, 1). Construct a 3 by 3 matrix whose column space is only a line. 


If the 9 by 12 system Ax = bis solvable for every b, then C(A) = 


Challenge Problems 


Suppose S and T are two subspaces of a vector space V. 


(a) Definition: The sum S + T contains all sums s + t of a vector s in S and a 
vector t in T. Show that S + T satisfies the requirements (addition and scalar 
multiplication) for a vector space. 


(b) If S and T are lines in R™, what is the difference between S + T and S U T? 
That union contains all vectors from S or T or both. Explain this statement: 
The span of S U T is S + T. (Section 3.5 returns to this word “span”.) 


If S is the column space of A and T is C(B), then S + T is the column space of 
what matrix M? The columns of A and B and M are all in R”. (I don’t think A + B 
is always a correct M.) 


Show that the matrices A and [ A AB] (with extra columns) have the same column 
space. But find a square matrix with C'( A?) smaller than C(A). Important point: 


An n by n matrix has C(A) = R” exactly when A is an matrix. 
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3.2 The Nullspace of A: Solving Ax = 0 and Ra = 0 


1 The nullspace N(A) in R” contains all solutions æ to Ax = 0. This includes x = 0. 
2 Elimination (from A to U to R) does not change the nullspace: N(A) = N(U) = N(R). 

3 The reduced row echelon form R = rref(A) has all pivots = 1, with zeros above and below. 
4 If column j of R is free (no pivot), there is a “special solution” to Ax = 0 with z; = 1. 


5 Number of pivots = number of nonzero rows in R = rank r. There are n — r free columns. 


6 Every matrix with m < n has nonzero solutions to Ax = 0 in its nullspace. 


This section is about the subspace containing all solutions to Ax = 0. The m by n matrix 
A can be square or rectangular. The right hand side is b = 0. One immediate solution is 
x = 0. For invertible matrices this is the only solution. For other matrices, not invertible, 
there are nonzero solutions to Aæ = 0. Each solution x belongs to the nullspace of A. 
Elimination will find all solutions and identify this very important subspace. 


The nullspace N (A) consists of all solutions to Ax = 0. These vectors x are in R”. 


Check that the solution vectors form a subspace. Suppose æ and y are in the nullspace (this 
means Ax = 0 and Ay = 0). The rules of matrix multiplication give A(z + y) = 0+ 0. 
The rules also give A(ca) = c0. The right sides are still zero. Therefore x + y and cæ are 
also in the nullspace N (A). Since we can add and multiply without leaving the nullspace, 
it is a subspace. 

To repeat: The solution vectors x have n components. They are vectors in R”, so 
the nullspace is a subspace of R”. The column space C(A) is a subspace of R”. 


n 5 . This matrix is singular! 


Example 1 Describe the nullspace of A = 3 6 


Solution Apply elimination to the linear equations Ax = 0: 


zı + 2x2 = 0 = zı + 2x2 = 0 
3xı + 6%2 = 0 0=0 


There is really only one equation. The second equation is the first equation multiplied 
by 3. In the row picture, the line zı + 2%2 = 0 is the same as the line 32, + 6x2 = Q. 
That line is the nullspace N (A). It contains all solutions (xj, £2). 


To describe the solutions to Ax = O, here is an efficient way. Choose one point on 
the line (one “special solution”). Then all points on the line are multiples of this one. 
We choose the second component to be 2 = 1 (a special choice). From the equation 
£1 + 2x2 = 0, the first component must be zı = —2. The special solution is s = (—2, 1). 
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Special 


.  As=O Thenullspace of A = 3 
solution 


3 a contains all multiples of s = | ; 


1 


This is the best way to describe the nullspace, by computing special solutions to Ax = 0. 
The solution is special because we set the free variable to x2 = 1. 


The nullspace of A consists of all combinations of the special solutions to Ax = 0. 


Example 2 x + 2y + 3z = 0 comes from the 1 by 3 matrix A = [1 2 3]. Then 
Az = O produces a plane. All vectors on the plane are perpendicular to (1,2, 3). 
The plane is the nullspace of A. There are two free variables y and z: Set to 0 and 
1. 


T —2 —3 
[1 2 3 | y | = 0 has two special solutions sı = 1} and so = 0 
Z 0 1 


Those vectors sı and sə lie on the plane x + 2y + 3z = 0. All vectors on the plane are 
combinations of sı and So. 

Notice what is special about sı and s2. The last two components are “free” and we 
choose them specially as 1,0 and 0,1. Then the first components —2 and —3 are deter- 
mined by the equation Az = 0. 


The solutions to x + 2y + 3z = 6 also lie on a plane, but that plane is not a subspace. 
The vector æ = 0 is only a solution if b = 0. Section 3.3 will show how the solutions to 
Az = b (if there are any solutions) are shifted away from zero by one particular solution. 


The two key steps of this section are (1) reducing A to its row echelon form R 
(2) finding the special solutions to Az = 0 


The display on page 138 shows 4 by 5 matrices A and R, with 3 pivots. 


The equations Az = 0 and also Ræ = O have 5 — 3 = 2 special solutions sı and s2. 


Pivot Columns and Free Columns 


The first column of A = [1 2 al contains the only pivot, so the first component of x 
is not free. The free components correspond to columns with no pivots. The special 
choice (one or zero) is only for the free variables in the special solutions. 


Example 3 Find the nullspaces of A, B, C and the two special solutions to Cæ = 0. 
1 2 


a=|3 s| B=[sal=2 a) C=(4 41=[5 s 6 a6): 
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Solution The equation Ax = 0 has only the zero solution x = 0. The nullspace is Z. 
It contains only the single point æ = 0 in R°. This fact comes from elimination: 


s= [5 a) lea = fo] ve fo al [22] = ol e [232 a) 


A is invertible. There are no special solutions. Both columns of this matrix have pivots. 

The rectangular matrix B has the same nullspace Z. The first two equations in Bæ = 0 
again require x = 0. The last two equations would also force æ = 0. When we add extra 
equations (giving extra rows), the nullspace certainly cannot become larger. The extra rows 
impose more conditions on the vectors x in the nullspace. 


The rectangular matrix C is different. It has extra columns instead of extra rows. The 
solution vector x has four components. Elimination will produce pivots in the first two 
columns of C, but the last two columns of C and U are “free”. They don’t have pivots: 


Subtract 3 (row 1) | DI A 


from row 2 of C =e 8 6 16 


| becomes v = | a: i 


020 4 
ane 


pivot columns free columns 


For the free variables x3 and x4, we make special choices of ones and zeros. First 73 = 1, 
x4 = Oand second v3 = 0, x4 = 1. The pivot variables zı and x2 are determined by the 
equation Ux = 0 (or Ca = 0 or eventually Ra = 0). We get two special solutions in the 
nullspace of C. This is also the nullspace of U: elimination doesn’t change solutions. 


Special —2 0| < pivot 
solutions 0 —2| <4- variables 
Cs =0 "= 1 and 33 = 0| <+ free 

Us = 0 0 1| < variables 


The Reduced Row Echelon Form R 


When A is rectangular, elimination will not stop at the upper triangular U. We can continue 
to make this matrix simpler, in two ways. These steps bring us to the best matrix R: 


1. Produce zeros above the pivots. Use pivot rows to eliminate upward in R. 


2. Produce ones in the pivots. Divide the whole pivot row by its pivot. 


Those steps don’t change the zero vector on the right side of the equation. The nullspace 
stays the same: N (A) = N (U) = N(R). This nullspace becomes easiest to see when we 
reach the reduced row echelon form R = rref (A). The pivot columns of R contain I. 
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Reduced ai ft ta? 
form R TROL -2= 6 


] becomes R= | 


>. o 
Sat 
Nes 
——— | 


I subtracted row 2 of U from row 1. Then I multiplied row 2 by 3 to get pivot = 1. 


Now (free column 3) = 2 (pivot column 1), so —2 appears in sı = (—2,0,1,0). The 
special solutions are much easier to find from the reduced system Ra = 0. In each free 
column of R, I change all the signs to find s. Second special solution s2 = (0, —2, 0,1). 


Before moving to m by n matrices A and their nullspaces N (A) and special solutions, 
allow me to repeat one comment. For many matrices, the only solution to Ax = O is 
x = 0. Their nullspaces N(A) = Z contain only that zero vector: no special solutions. 
The only combination of the columns that produces b = 0 is then the “zero combination”. 
The solution to Aw = 0 is trivial (just x = 0) but the idea is not trivial. 

This case of a zero nullspace Z is of the greatest importance. It says that the columns 
of A are independent. No combination of columns gives the zero vector (except the zero 
combination). All columns have pivots, and no columns are free. You will see this idea of 
independence again... 


Pivot Variables and Free Variables in the Echelon Matrix R 


Dip] pF i OSG 26 E e 
0.1 6 Od 
= || || aeons] ssf af a} 8 
0 0 0 0 0 0 1 
3 pivot columns p I in pivot columns special Rs; =O and Rs2=0 
2 free columns f F in free columns take —a to —e from R 
to be revealed by R 3 pivots: rank r = 3 Rs = 0 means As = 0 


R shows clearly: column 3 = a(column 1) + b(column 2). The same must be true for A. 
The special solution sı repeats that combination so (—a, —b, 1, 0,0) has Rs; = 0. 
Nullspace of A = Nullspace of R = all combinations of sı and s2. 


Here are those steps for a 4 by 7 reduced row echelon matrix R with three pivots: 


Le QO & Ser tere Three pivot variables z1, £2, £6 
wee. 0 LE r aon Four free variables £3, £4, £5, £7 
ERUR A O sk Fe Four special solutions s in N(R) 
Or ON OE OO OG The pivot rows and columns contain I 


Question What are the column space and the nullspace for this matrix R? 
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Answer The columns of R have four components so they lie in R*. (Not in R?!) 
The fourth component of every column is zero. Every combination of the columns— 
every vector in the column space—has fourth component zero. The column space C (R) 
consists of all vectors of the form (bj, bz, b3,0). For those vectors we can solve Ra = b. 

The nullspace N(R) is a subspace of R’. The solutions to Ræ = O are all the 
combinations of the four special solutions—one for each free variable: 


1. Columns 3, 4, 5, 7 have no pivots. So the four free variables are 73, £4, £5, £7. 
2. Set one free variable to 1 and set the other three free variables to zero. 


3. To find s, solve Rx = 0 for the pivot variables x71, £2, £6. 


Counting the pivots leads to an extremely important theorem. Suppose A has more 
columns than rows. With n > m there is at least one free variable. The system Ax = 0 
has at least one special solution. This solution is not zero! 


Suppose Ax = 0 has more unknowns than equations (n > m, more columns than rows). 
There must be at least one free column. Then Az = 0 has nonzero solutions. 


A short wide matrix (n > m) always has nonzero vectors in its nullspace. There must be 
at least n — m free variables, since the number of pivots cannot exceed m. (The matrix 
only has m rows, and a row never has two pivots.) Of course a row might have no pivot— 
which means an extra free variable. But here is the point: When there is a free variable, 
it can be set to 1. Then the equation Ax = O has at least a line of nonzero solutions. 

The nullspace is a subspace. Its “dimension” is the number of free variables. This 
central idea—the dimension of a subspace—is defined and explained in this chapter. 


The Rank of a Matrix 


The numbers m and n give the size of a matrix—but not necessarily the true size of a linear 
system. An equation like 0 = 0 should not count. If there are two identical rows in A, 
the second one disappears in elimination. Also if row 3 is a combination of rows 1 and 2, 
then row 3 will become all zeros in the triangular U and the reduced echelon form R. 
We don’t want to count rows of zeros. The true size of A is given by its rank. 


DEFINITION OF RANK The rank of A is the number of pivots. This number is r. 


That definition is computational, and I would like to say more about the rank r. 
The final matrix R will have r nonzero rows. Start with a 3 by 4 example of rank r = 2: 


Four columns Hee Le 

lwo (avers Az=|1 2 2 5 R=|/O 1 0 1 

p EEE 000 0 
The first two columns of A are (1,1,1) and (1,2,3), going in different directions. 
Those will be pivot columns (revealed by R). The third column (2,2,2) is a multiple 
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of the first. We won’t see a pivot in that third column. The fourth column (4, 5,6) is the 
sum of the first three. That fourth column will also have no pivot. The rank of A and R is 2. 


Every “free column” is a combination of earlier pivot columns. It is the 
special solutions s that tell us those combinations: 


Column 3 = 2 (column 1) + 0 (column 2) sı = (—2, —0, 1,0) 
Column 4 = 3 (column 1) + 1 (column 2) 82 = (—3, —1, 0, 1) 


The numbers 2,0 in column 3 of R show up in sı (with signs reversed). And the 
numbers 3,1 in column 4 of R show up in s2 (with signs reversed to —3, -—1). 


Rank One 


Matrices of rank one have only one pivot. When elimination produces zero in the first 
column, it produces zero in all the columns. Every row is a multiple of the pivot row. At 
the same time, every column is a multiple of the pivot column! 


1 3 10 1- 3 10 
Rank one matrix A= |2 60 20 — R=]|0 0 0 
3 9 30 00 0 


The column space of a rank one matrix is “one-dimensional”. Here all columns are on the 
line through u = (1,2,3). The columns of A are u and 3u and 10u. Put those numbers 
into the rowvT =[ 1 3 10 ] and you have the special rank one form A = uv: 


1 3 10 1 | [1 3 10] 
A = column times row = uv! 2 6 20 |= | 2 
3 9 30 3 
T 


With rank one, Av = 0 is easy to understand. That equation u(v*æ) = 0 leads us to 
vTæ = 0. All vectors æ in the nullspace must be orthogonal to v in the row space. 


This is the geometry when r = 1: row space = line, nullspace = perpendicular plane. 


Example 4 When all rows are multiples of one pivot row, the rank is r = 1: 


iF GS a4 0 3 5 
$ 6 a and i 4 and H and [6] all have rank 1. 


For those matrices, the reduced row echelon R = rref (A) can be checked by eye: 


1 Rk S44 0 1 1 : 
T= f 0 and k J and fo and [1] have only one pivot. 

Our second definition of rank will be at a higher level. It deals with entire rows and 
entire columns—vectors and not just numbers. All three matrices A and U and R have r 
independent rows. 


3.2. The Nullspace of A: Solving Aw = 0 and Rx = 0 141 


A and U and R also have r independent columns (the pivot columns). Section 3.4 
says what it means for rows or columns to be independent. 

A third definition of rank, at the top level of linear algebra, will deal with spaces of 
vectors. The rank r is the “dimension” of the column space. It is also the dimension of 
the row space. The great thing is that n — r is the dimension of the nullspace. 


= REVIEW OF THE KEY IDEAS = 


1. The nullspace N(A) is a subspace of R”. It contains all solutions to Aw = 0. 

. Elimination on A produces a row reduced R with pivot columns and free columns. 

. Every free column leads to a special solution. That free variable is 1, the others are 0. 
. The rank r of A is the number of pivots. All pivots are 1’s in R = rref(A). 


. The complete solution to Ax = 0 is a combination of the n — r special solutions. 


A nan Aà & WN 


. A always has a free column if n > m, giving a nonzero solution to Ax = 0. 


= WORKED EXAMPLES = 


3.2 A Why do A and R have the same nullspace if EA = R and E is invertible? 
Solution If Ax =Othen Ra = EAx = EO=0 
If R£ =Othen Ag = E-1Ra2 = E710 = 0 


A and R also have the same row space and the same rank. 


3.2B Create a 3 by 4 matrix R whose special solutions to Ra = 0 are sı and s2: 


—3 —2 
z 1 md. a 0 pivot columns 1 and 3 
a 0 a | =6 free variables zə and x4 
0 1 


Describe all possible matrices A with this nullspace N (A) = all combinations of sı and s2. 


Solution The reduced matrix R has pivots = 1 in columns 1 and 3. There is no third 

pivot, so row 3 of R is all zeros. The free columns 2 and 4 will be combinations of the 

pivot columns: 3,0, 2,6 in R come from —3, —0, —2, —6 in sı and s2. Every A = ER. 
Every 3 by 4 matrix has at least one special solution. These matrices have two. 


13 0 2 
R= ]0 01 6 has Rs; =O and Rs. =O. 
0 0 0 0 
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3.2 C Find the row reduced form R and the rank r of A and B (those depend on c). 
Which are the pivot columns of A? What are the special solutions? 


ie ee 
Find special solutions A= |) 3-6-3 and B= ane | : 
4 8 c ee 


Solution The matrix A has row 2 = 3 (row 1). The rank of Ais r = 2 except if c = 4. 
Row 4 — 4 (row 1) ends in c — 4. The pivots are in columns 1 and 3. The second variable 
x2 is free. Notice the form of R: Row 3 has moved up into row 2. 


12 0 ied oh 
c#4 R=|001 c=4 R=|0 00 
o0 0 0 0 0 


Two pivots leave one free variable x2. But when c = 4, the only pivot is in column 1 
(rank one). The second and third variables are free, producing two special solutions: 


c #4 Special solution (—2, 1,0) c=4 Another special solution (—1, 0, 1). 


The 2 by 2 matrix B = f- | has rank r = 1 except if c = 0, when the rank is zero! 


|i 2 = eee ou ne 
c #0 R=| 5 | c=0 R=|) A and nullspace = R^. 
Problem Set 3.2 
1 Reduce A and B to their triangular echelon forms U. Which variables are free? 
Le 2 A 6 2 4 2 
faj A= |b 2 3 6 9 (b) B=]|0 4 4}, 
0 0 1I 2 3 0 8 8 
2 For the matrices in Problem 1, find a special solution for each free variable. (Set the 
free variable to 1. Set the other free variables to zero.) 
3 By further row operations on each U in Problem 1, find the reduced echelon form R. 


True or false with a reason: The nullspace of R equals the nullspace of U. 


4 For the same A and B, find the special solutions to Ax = 0 and Bæ =0. For an m by 
n matrix, the number of pivot variables plus the number of free variables is 
This is the Counting Theorem: r + (n — 1) =n. 


“1 3 5 -1 3 § 
@ A= [7 6 a (0) E 6 1 
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Questions 5—14 are about free variables and pivot variables. 
5 True or false (with reason if true or example to show it is false): 


(a) A square matrix has no free variables. 
(b) An invertible matrix has no free variables. 
(c) An m by n matrix has no more than n pivot variables. 


(d) An m by n matrix has no more than m pivot variables. 


6 Put as many 1’s as possible in a 4 by 7 echelon matrix U whose pivot columns are 
(a) 2,4, 5 (b) 1, 3, 6, 7 (c) 4 and 6. 
7 Put as many 1’s as possible in a 4 by 8 reduced echelon matrix R so that the free 
columns are 
(a) 2,4, 5, 6 (b) 1,3, 6, 7, 8. 
8 Suppose column 4 of a 3 by 5 matrix is all zero. Then 2,4 is certainly a 


variable. The special solution for this variable is the vector x = 


9 Suppose the first and last columns of a 3 by 5 matrix are the same (not zero). 
Then is a free variable. Find the special solution for this variable. 


10 Suppose an m by n matrix has r pivots. The number of special solutions is 
The nullspace contains only x = 0 when r = . The column space is all of 
R” when r = 


11 The nullspace of a 5 by 5 matrix contains only x = O when the matrix has 
pivots. The column space is RË when there are pivots. Explain why. 


12 The equation x — 3y — z = 0 determines a plane in R. What is the matrix A in this 
equation? Which variables are free? The special solutions are and 


13 (Recommended) The plane x — 3y — z = 12 is parallel to x — 3y — z = 0. One 
particular point on this plane is (12, 0,0). All points on the plane have the form 


xe RR 
II 
SiS 
+ 
< 


14 Suppose column 1 + column 3 + column 5 = 0 ina 4 by 5 matrix with four pivots. 
Which column has no pivot? What is the special solution? Describe N (A). 


Questions 15—22 ask for matrices (if possible) with specific properties. 
15 Construct a matrix for which N (A) = all combinations of (2, 2, 1, 0) and (3, 1, 0, 1). 
16 Construct A so that N(A) = all multiples of (4, 3, 2, 1). Its rank is 
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17 


18 


19 


20 
21 
22 
23 


24 


25 
26 


27 


28 
29 


30 


31 
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Construct a matrix whose column space contains (1, 1,5) and (0,3, 1) and whose 
nullspace contains (1,1, 2). 


Construct a matrix whose column space contains (1,1,0) and (0,1, 1) and whose 
nullspace contains (1,0, 1) and (0, 0, 1). 


Construct a matrix whose column space contains (1, 1, 1) and whose nullspace is the 
line of multiples of (1,1, 1,1). 


Construct a 2 by 2 matrix whose nullspace equals its column space. This is possible. 
Why does no 3 by 3 matrix have a nullspace that equals its column space? 
If AB = 0 then the column space of B is contained in the of A. Why? 


The reduced form R of a 3 by 3 matrix with randomly chosen entries is almost sure 
to be . What R is virtually certain if the random A is 4 by 3? 


Show by example that these three statements are generally false: 


(a) A and AT have the same nullspace. 
(b) A and AT have the same free variables. 
(c) If R is the reduced form rref(A) then RT is rref(A‘). 


If N(A) = all multiples of x = (2,1,0,1), what is R and what is its rank? 


If the special solutions to Ra = O are in the columns of these nullspace matrices NV, 
go backward to find the nonzero rows of the reduced matrices R: 


0 
0 and N= 10 and N= (empty 3 by 1). 
1 1 


(a) What are the five 2 by 2 reduced matrices R whose entries are all 0’s and 1’s? 


(b) What are the eight 1 by 3 matrices containing only 0’s and 1’s? Are all eight of 
them reduced echelon matrices R ? 


Explain why A and — A always have the same reduced echelon form R. 


If A is 4 by 4 and invertible, describe the nullspace of the 4 by 8 matrix B = [A Al]. 
How is the nullspace N (C) related to the spaces N (A) and N (B), if C = | > | ? 


Find the reduced row echelon forms R and the rank of these matrices: 
(a) The 3 by 4 matrix with all entries equal to 4. 
(b) The 3 by 4 matrix with a;; =71+ jg —1. 
(c) The 3 by 4 matrix with a;; = (—1)/. 
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32 


33 


34 


35 


36 


37 


38 


Kirchhoff’s Current Law Aly = 0 says that current in = current out at every node. 
At node 1 this is y3 = yı + y4. Write the four equations for Kirchhoff’s Law 
at the four nodes (arrows show the positive direction of each y). Reduce AT to R 
and find three special solutions in the nullspace of AT (4 by 6 matrix). 


yı 
1 —»> 2 
YA Y5 
—1 1 0 0 
0 —l1 1 0 
v2 ay y o i 
0 —l1 0 1 
0 0 —1 1 
3 


Which of these rules gives a correct definition of the rank of A? 


(a) The number of nonzero rows in R. 

(b) The number of columns minus the total number of rows. 
(c) The number of columns minus the number of free columns. 
(d) The number of 1’s in the matrix R. 


Find the reduced R for each of these (block) matrices: 


0 0 0 
A=|0 0 3 B=[A A] gali a 
246 


Suppose all the pivot variables come last instead of first. Describe all four blocks in 
the reduced echelon form (the block B should be r by r): 


A B 
R= i D | 
What is the nullspace matrix N containing the special solutions? 


(Silly problem) Describe all 2 by 3 matrices A; and Ag, with row echelon forms 
R, and Rə, such that Rı + Ro is the row echelon form of A; + Ag. Is is true that 
R, = A; and Rg = Ap in this case? Does Rı — Rp equal rref(A; — A2)? 


If A has r pivot columns, how do you know that AT has r pivot columns? Give a 3 
by 3 example with different column numbers in pivcol for A and AT. 


What are the special solutions to Rx = 0 and yT R = 0 for these R? 
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39 Fill out these matrices so that they have rank 1: 


1 2 4 9 as 
A= |2 and B= |1 and m= | 
4 2 6 -3 
40 If Aisanm by n matrix with r = 1, its columns are multiples of one column and its 
rows are multiples of one row. The column space is a in R™. The nullspace 
is a in R”. The nullspace matrix N has shape 
41 Choose vectors u and v so that A = uv? = column times row: 
3 6 6 
A=]|i 2 2 and A= i 1 = k : 
4 8 8 


A = uv! is the natural form for every matrix that has rank r = 1. 
42 If Aisarank one matrix, the second row of R is . Do an example. 


Problems 43-45 are about r by r invertible matrices inside A. 


43 If A has rank r, then it has an r by r submatrix S that is invertible. Remove 
m — r rows and n — r columns to find an invertible submatrix S inside A, B, and C. 
You could keep the pivot rows and pivot columns: 
0 1 0 
1 2 3 1 2 3 
Ju | | B= | | c=|0 0 0 
1 2 4 2 4 6 001 


44 Suppose P contains only the r pivot columns of an m by n matrix. Explain why this 
m by r submatrix P has rank r. 


45 Transpose P in Problem 44. Find the r pivot columns of PT (which is r by m). 
Transposing back, this produces an r by r invertible submatrix S inside P and A: 


12 
Fr A= |2 4 find P (3 by 2) and then the invertible S (2 by 2). 
2 4 


NOW 


Problems 46-51 show that rank( AB) is not greater than rank(A) or rank(B). 


46 Find the ranks of AB and AC (rank one matrix times rank one matrix): 


1 2 2 1 4 1 b 
a=] | and B= |; L5 | and gai ae 


47 The rank one matrix uv” times the rank one matrix wz? is wz! times the number 


. This product wv! wz? also has rank one unless ='(; 
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48 


49 


50 


51 


52 


53 


54 


55 


56 


(a) Suppose column 7 of B is a combination of previous columns of B. Show that 
column j of AB is the same combination of previous columns of AB. Then 
AB cannot have new pivot columns, so rank( AB) < rank(B). 


(b) Find A; and Ag so that rank(A; B) = 1 and rank(A2B) = 0 for B = [+3]. 


Problem 48 proved that rank(AB) < rank(B). Then the same reasoning gives 
rank(B? A’) < rank(A™). How do you deduce that rank(AB) < rank A? 


(Important) Suppose A and B are n by n matrices, and AB = J. Prove from 
rank(AB) < rank(A) that the rank of A is n. So A is invertible and B must be its 


two-sided inverse (Section 2.5). Therefore BA = I (which is not so obvious!). 


If A is 2 by 3 and B is 3 by 2 and AB = J, show from its rank that BA # I. Give an 
example of A and B with AB = I. For m < n, a right inverse is not a left inverse. 


Suppose A and B have the same reduced row echelon form R. 


(a) Show that A and B have the same nullspace and the same row space. 


(b) We know E£ A = Rand Ea B = R. So A equals an matrix times B. 


Express A and then B as the sum of two rank one matrices: 
1 1 0 
rank = 2 A=j|1 1 4 sel le 
1 1 8 
Answer the same questions as in Worked Example 3.2 C for 


|e a P 
A=|2 2 4 4 and Bal" gle 
EER É 


What is the nullspace matrix NV (containing the special solutions) for A, B, C? 


(a 


Block matrices A= |e I) and B= f 0 


| and C=[I I I]. 


Neat fact Every m by n matrix of rank r reduces to (m by r) times (r by n): 
A = (pivot columns of A) (first r rows of R) = (COL) (ROW). 


Write the 3 by 4 matrix A of all ones as the product of the 3 by 1 matrix from the 
pivot columns and the 1 by 4 matrix from R. 
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Chapter 3. Vector Spaces and Subspaces 
Challenge Problems 


Suppose A is an m by n matrix of rank r. Its reduced echelon form is R. Describe 
exactly the matrix Z (its shape and all its entries) that comes from transposing the 
reduced row echelon form of R” : 


R= rref(A) and Z = (rref(A™))?. 


(Recommended) Suppose R is m by n of rank r, with pivot columns first: 


I F 
e= [5 3]. 
(a) What are the shapes of those four blocks? 
(b) Find a right-inverse B with RB = I if r = m. The zero blocks are gone. 
(c) Find a left-inverse C with CR = I ifr = n. The F and 0 column is gone. 
(d) What is the reduced row echelon form of RT (with shapes)? 
(e) What is the reduced row echelon form of RTR (with shapes)? 
I think that the reduced echelon form of RTR is always R (except for extra zero 


rows). Can you do an example when R is 2 by 3? Later we show that AT A always 
has the same nullspace as A (a valuable fact). 


Suppose you allow elementary column operations on A as well as elementary row 
operations (which get to R). What is the “row-and-column reduced form” for an m 
by n matrix of rank r? 
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Elimination: The Big Picture 


This page explains elimination at the vector level and subspace level, when A is reduced 
to R. You know the steps and I won’t repeat them. Elimination starts with the first pivot. 
It moves a column at a time (left to right) and a row at a time (top to bottom). As it moves, 
elimination answers two questions: 


Question 1 Is this column a combination of previous columns? 


If the column contains a pivot, the answer is no. Pivot columns are “independent” of 
previous columns. If column 4 has no pivot, it is a combination of columns 1, 2, 3. 


Question 2 Is this row a combination of previous rows? 


If the row contains a pivot, the answer is no. Pivot rows are “independent” of previous 
rows. If row 3 ends up with no pivot, it is a zero row and it is moved to the bottom of R. 


It is amazing to me that one pass through the matrix answers both questions. Actually 
that pass reaches the triangular echelon matrix U, not the reduced echelon matrix R. Then 
the reduction from U to R goes bottom to top. U tells which columns are combinations of 
earlier columns (pivots are missing). Then R tells us what those combinations are. 


In other words, R tells us the special solutions to Ax = 0. We could reach R from 
A by different row exchanges and elimination steps, but it will always be the same R 
(because the special solutions are decided by A). In the language coming soon, R reveals 
a “basis” for three fundamental subspaces: 


The column space of A—choose the pivot columns of A as a basis. 
The row space of A—choose the nonzero rows of F as a basis. 
The nullspace of A—choose the special solutions to Ræ = 0 (and Ax = 0). 


We learn from elimination the single most important number—the rank r. That number 
counts the pivot columns and the pivot rows. Then n — r counts the free columns and 
the special solutions. 

I mention that reducing [A J] to |R £] will tell you even more about A—in fact 
virtually everything (including HA = R). The matrix E keeps a record, otherwise lost, 
of the elimination from A to R. When A is square and invertible, R is J and F is A7?. 
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3.3 The Complete Solution to Ax = b 


Complete solution to Ax = b: x = (one particular solution £p) + (any x, in the nullspace). 


Elimination on [A b | leads to [R d ] . Then Ax = bis equivalent to Rx = d. 


Az = b and Rg = d are solvable only when all zero rows of R have zeros in d. 

When Rx = dis solvable, one very particular solution x, has all free variables equal to zero. 
A has full column rank r = n when its nullspace N (A) = zero vector: no free variables. 

A has full row rank r = m when its column space C(A) is R”: Az = bis always solvable. 


The four cases are r = m = n (A is invertible) and r = m < n (every Ax = bis solvable) 
and r= n < m (Ax = b has 1 or 0 solutions) and r < m,r < n (0 or œ solutions). 


The last section totally solved Ax = 0. Elimination converted the problem to Ræ = 0. 
The free variables were given special values (one and zero). Then the pivot variables were 
found by back substitution. We paid no attention to the right side b because it stayed 
at zero. The solution x was in the nullspace of A. 

Now 6 is not zero. Row operations on the left side must act also on the right side. 
Ax = b is reduced to a simpler system Ra = d with the same solutions. One way to 
organize that is to add b as an extra column of the matrix. I will “augment” A with the 
right side (b1, b2, b3) = (1, 6, 7) to produce the augmented matrix [A b|: 


1 3 0 2 1 has the Prat Ae ee 
001 4|/*|=I6 augmented 00 146 = [A b| 
EEE E 7| matrix Lica, Gy 

4 


When we apply the usual elimination steps to A, reaching R, we also apply them to b. 
In this example we subtract row 1 from row 3. Then we subtract row 2 from row 3. 
This produces a row of zeros in R, and it changes b to a new right side d = (1,6,0): 


1 3 0 2|] 1| has the amo) 2 Sb 
Oo Lf 4 7] = 16] augmented |0 0 1 4 6 =H d]. 
0 0 0 a 0| matrix OF G08 0 

4 


That very last zero is crucial. The third equation has become 0 = 0. So the equations can 
be solved. In the original matrix A, the first row plus the second row equals the third row. 
If the equations are consistent, this must be true on the right side of the equations also! 
The all-important property of the right side b was 1 + 6 = 7. That led to 0 = 0. 
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Here are the same augmented matrices for a general b = (b1, b2, b3): 


1302 oy 130232 b 
[A b]=|0 0 1 4 b| —]|0 014 & = |R d] 
1 3 1 6 b 0 0 0 0 b3-b,-by 


Now we get 0 = 0 in the third equation only if b3 — bı — b2 = 0. This is bı + bg = b3. 


One Particular Solution Ax, = b 


For an easy solution Œp, choose the free variables to be zero: 2 = x4 = 0. Then the two 
nonzero equations give the two pivot variables x; = 1 and z3 = 6. Our particular 
solution to Ax = b (and also Ra = d) is a = (1,0,6,0). This particular solution 
is my favorite: free variables = zero, pivot variables from d. The method always works. 


For a solution to exist, zero rows in R must also be zero in d. Since J is in the 
pivot rows and pivot columns of R, the pivot variables in Zparticular Come from d: 


1 3 0 2 > 1 Pivot variables 1, 6 
Ra=|0 0 1 4 6|- 6 Free variables 0, 0 
0000 0 0 Solution zp = (1,0, 6,0). 


Notice how we choose the free variables (as zero) and solve for the pivot variables. After 
the row reduction to R, those steps are quick. When the free variables are zero, the pivot 
variables for x, are already seen in the right side vector d. 


particular The particular solution solves A®, = b 


Tnullspace The n — r special solutions solve Az, = 0. 


That particular solution is (1,0,6,0). The two special (nullspace) solutions to 
Rx = 0 come from the two free columns of R, by reversing signs of 3,2, and 4. 
Please notice how I write the complete solution x, + £n to Ax = b: 


Complete solution i p 7 
rie L= Mt Ea = | g| t22] g| +4 |a 
many £n 5 : 


Question Suppose A is a square invertible matrix, m = n = r. What are x, and £n? 

Answer The particular solution is the one and only solution zp = A~‘'b. There 
are no special solutions or free variables. R = J has no zero rows. The only vector 
in the nullspace is £n = 0. The complete solution is x = £p + £n = A~'b+ 0. 
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We didn’t mention the nullspace in Chapter 2, because A was invertible and N( A) 
contained only the zero vector. Reduction went from [A b| to (7 Atb]. The matrix 
A was reduced all the way to J. Then Ax = b became x = A~‘b which is d. This is a 
special case here, but square invertible matrices are the ones we see most often in practice. 
So they got their own chapter at the start of the book. 


For small examples we can reduce [A b| to [R d]. For a large matrix, 
MATLAB does it better. One particular solution (not necessarily ours) is x = A\b 
from backslash. Here is an example with full column rank. Both columns have pivots. 


Example 1 Find the condition on ( ı þb2, b3) for Ax = b to be solvable, if 


1 1 bi 
A= 1 2| and b= |b2 
—2 -3 b3 


This condition puts b in the column space of A. Find the complete x = a + £n. 
Solution Use the augmented matrix, with its extra column b. Subtract row 1 of [A b | 
from row 2. Then add 2 times row 1 to row 3 to reach [R d | ; 


Lt 1 I 1 by 1 0 2b; — be 
1 2 bo +> 10 iL bo = bi +> |O 1 bə = bi 
—2 —3 bg 0 —1 63+ 26; 0 0 bs +6; + be 


The last equation is 0 = 0 provided bz + bı + b2 = 0. This is the condition to put b in 
the column space. Then Ax = b will be solvable. The rows of A add to the zero row. 
So for consistency (these are equations!) the entries of b must also add to zero. 

This example has no free variables since n — r = 2 — 2. Therefore no special solutions. 
The nullspace solution is z,, = 0. The particular solution to Ax = b and Ra = d is at the 
top of the final column d: 


: P _ __ | 2by — b2 0 
Only solution to Ax = b geata | 519] +[)]. 


If b3 + bı + bg is not zero, there is no solution to Ax = b (a, and x don’t exist). 

This example is typical of an extremely important case: A has full column rank. 
Every column has a pivot. The rank is r = n. The matrix is tall and thin (m > n). 
Row reduction puts J at the top, when A is reduced to R with rank n: 


(1) 


Full column rank R = F | = 7 by n identity matrix 


0 m — n rows of zeros 


There are no free columns or free variables. The nullspace is Z = {zero vector}. 
We will collect together the different ways of recognizing this type of matrix. 


3.3. The Complete Solution to Aw = b 153 


Every matrix A with full column rank (r = n) has all these properties: 


1. All columns of A are pivot columns. 


2 
a 


There are 


no free variables or special solutions. 


3. The nullspace N (A) contains only the zero vector x = 0. 


iœ = O has a solution (it might not) then it has only one solution. 


In the essential language of the next section, this A has independent columns. 
Ax = 0 only happens when x = O. In Chapter 4 we will add one more fact to the list: 
The square matrix AT A is invertible when the rank is n. 

In this case the nullspace of A (and R) has shrunk to the zero vector. The solution 
to Ax = b is unique (if it exists). There will be m — n zero rows in R. So there are 
m — n conditions on b in order to have 0 = 0 in those rows, and b in the column space. 
With full column rank, Ax = b has one solution or no solution (m > n is overdetermined). 


The Complete Solution 


The other extreme case is full row rank. Now Ax = b has one or infinitely many solutions. 
In this case A must be short and wide (m < n). A matrix has full row rank if r = m. 
“The rows are independent.” Every row has a pivot, and here is an example. 


Example 2 This system Ax = b has n = 3 unknowns but only m = 2 equations: 


Full row rank (rank r =m = 2) 


= 3 
+ 2 =- z = 4 


These are two planes in xyz space. The planes are not parallel so they intersect in a line. 
This line of solutions is exactly what elimination will find. The particular solution will 
be one point on the line. Adding the nullspace vectors x,, will move us along the line in 
Figure 3.3. Then x = £p + £n gives the whole line of solutions. 


Xx =Xp+Xn 


Line of solutions to Ax = b 


Line of solutions to Ax = 0 


Figure 3.3: Complete solution = one particular solution + all nullspace solutions. 
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We find a, and £n by elimination on [A b]. Subtract row 1 from row 2 and then 
subtract row 2 from row 1: 


+ 1 i gs i ee ee aes LO a2 
i; 2 a1 il> = >ho e IE a]. 
The particular solution has free variable x3 = 0. The special solution has 73 = 1: 
Lparticular Comes directly from d on the right side: a, = (2.1.0) 
L special Comes from the third column (free column) of R: s = (—3, 2, 1) 


It is wise to check that x, and s satisfy the original equations Az, = b and As = 0: 


2+1 = 3 =3 F2 1 te 0 

2+2 = 4 —3+4-1 = 0 

The nullspace solution z,, is any multiple of s. It moves along the line of solutions, starting 
at Lparticular- Please notice again how to write the answer: 


This line of solutions is drawn in Figure 3.3. Any point on the line could have been chosen 
as the particular solution. We chose the point with x3 = 0. 

The particular solution is not multiplied by an arbitrary constant! The special solution 
needs that constant, and you understand why—to produce all æ» in the nullspace. 

Now we summarize this short wide case of full row rank. If m < n the equation 
Az = bis underdetermined (many solutions). 


with full row rank (r = m) 


In this case with m pivots, the rows are “linearly independent”. So the columns of AT 
are linearly independent. The nullspace of AT is the zero vector. 
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We are ready for the definition of linear independence, as soon as we summarize the 
four possibilities—which depend on the rank. Notice how r, m, n are the critical numbers. 


The four possibilities for linear equations depend on the rank r 


r=m ad r=n Square and invertible Ax =b has 1 solution 
r=m and F<n Short and wide Az =b has œ solutions 
ram and r=n Tall and thin Az =b has 0 or 1 solution 
ram and ran Not full rank Az =b has 0 or oo solutions 


The reduced R will fall in the same category as the matrix A. In case the pivot columns 
happen to come first, we can display these four possibilities for R. For Ra = d (and the 
original Ax = b) to be solvable, d must end in m — r zeros. F is the free part of R. 


I i F 
Four types for R [1] [I F] Al f A 
Their ranks pram =n TSM r=n<m fo mF <[Ln 


Cases 1 and 2 have full row rank r = m. Cases 1 and 3 have full column rank r = n. 
Case 4 is the most general in theory and it is the least common in practice. 


m REVIEW OF THE KEY IDEAS = 


. The rank r is the number of pivots. The matrix R has m — r zero rows. 
Az = bis solvable if and only if the last m — r equations reduce to 0 = 0. 
. One particular solution «, has all free variables equal to zero. 

. The pivot variables are determined after the free variables are chosen. 


. Full column rank r = n means no free variables: one solution or none. 


An BF WwW N p 


. Full row rank r = m means one solution if m = n or infinitely many if m < n. 


™ WORKED EXAMPLES = 


3.3 A This question connects elimination (pivot columns and back substitution) to 
column space-nullspace-rank-solvability (the higher level picture). A has rank 2: 


4, + 272+ 373 + 524 = bı 
Ax =b is 22%, +470 + 823 +1274 = bo 
321 + 629 + 7x73 + 1324 = b3 
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. Reduce [A b]to[U c], so that Ax = b becomes a triangular system Ux = c. 
. Find the condition on b1, b2, b3 for Ax = b to have a solution. 

. Describe the column space of A. Which plane in R? ? 

. Describe the nullspace of A. Which special solutions in R4 ? 

. Reduce [U c]to[R d]: Special solutions from R, particular solution from d. 
. Find a particular solution to Aw = (0, 6, —6) and then the complete solution. 


Solution 


w N 


3 1 2 3 5 
2 2 |b2-—2bı|—10 0 2 2| bə- 2b, 
2 0 0 0 0f bg + be — 5b, 


. The last equation shows the solvability condition b3 + b2 — 5b; = 0. Then 0 = 0. 
. First description: The column space is the plane containing all combinations of the 


pivot columns (1, 2,3) and (3,8,7). The pivots are in columns 1 and 3. Second 
description: The column space contains all vectors with b3 + b2 — 5b; = 0. That 
makes Ax = b solvable, so b is in the column space. All columns of A pass this test 
bs + b2 — 5b; = 0. This is the equation for the plane in the first description ! 


. The special solutions have free variables z2 = 1, x4 = 0 and then z2 = 0, z4 = 1: 


Special solutions to Ax = 0 —2 —2 
Back substitution in Ux = 0 sı = ; 8, = pe 
or change signs of 2, 2, 1 in R 0 1 


The nullspace N (A) in R* contains all £» = €181 + €282. 


. In the reduced form R, the third column changes from (3,2,0) in U to (0,1,0). 


The right side c = (0, 6, 0) becomes d = (—9, 3,0) showing —9 and 3 in qp: 
1 2 350 1 2 0 2 -9 

iu e= 0.0 2-2 6 |R a] 0 On ae 8 
00000 0000 0 


. One particular solution x, has free variables = zero. Back substitute in Ux = c: 


Particular solution to Ax, = b = 


Bring —9 and 3 fromthe vectord a, = 


OW Oo fo 


Free variables x2 and x4 are zero 


The complete solution to Ax = (0, 6, —6) is x = £p + En = Lp + C181 + C282. 
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3.3B Suppose you have this information about the solutions to Ax = b for a specific b. 
What does that tell you about m and n and r (and A itself)? And possibly about b. 


. There is exactly one solution. 

. All solutions to Ax = b have the form x = Ea + 
. There are no solutions. i 

. All solutions to Aæ = b have the form x = | 1| + 


. There are infinitely many solutions. 


n Ae UW Nm 


Solution In case 1, with exactly one solution, A must have full column rank r = n. 
The nullspace of A contains only the zero vector. Necessarily m > n. 

In case 2, A must have n = 2 columns (and m is arbitrary). With Ea in the nullspace 
of A, column 2 is the negative of column 1. Also A Æ 0: the rank is 1. With 2 = ka asa 
solution, b = 2(column 1) + (column 2). My choice for a would be (1,0). 

In case 3 we only know that b is not in the column space of A. The rank of A must be 
less than m. I guess we know b Æ 0, otherwise x = O would be a solution. 

In case 4, A must have n = 3 columns. With (1,0, 1) in the nullspace of A, column 3 
is the negative of column 1. Column 2 must not be a multiple of column 1, or the nullspace 
would contain another special solution. So the rank of A is 3 — 1 = 2. Necessarily A has 
m > 2 rows. The right side b is column 1 + column 2. 

In case 5 with infinitely many solutions, the nullspace must contain nonzero vectors. 
The rank r must be less than n (not full column rank), and b must be in the column space 
of A. We don’t know if every b is in the column space, so we don’t know if r = m. 


3.3C Find the complete solution £x = £p + £n by forward elimination on [A b]: 


EEFI 4] 

24 4 8 Se ee || ihe 

iso sj| 10 | 
LA 


Find numbers y1, y2, y3 so that yı (row 1) + y2 (row 2) + y3 (row 3) = zero row. Check 
that b = (4, 2, 10) satisfies the condition y1b1 + y2b2 + y3b3 = 0. Why is this the condition 
for the equations to be solvable and 6 to be in the column space? 


Solution Forward elimination on [A b] produces a zero rowin [U c]. The third equa- 
tion becomes 0 = 0 and the equations are consistent (and solvable): 


1 2 10 4 1 2 1 0 4 1 2 1 0 4 
2448 2| —]|0 0 2 8 -6 | — |002 8 -6 
4 8 6 8 10 0 0 2 8&8 —6 0 0 0 0 0 


Columns 1 and 3 contain pivots. The variables x2 and x4 are free. If we set those to zero 
we can solve (back substitution) for the particular solution or we continue to R. 
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Ra = d shows that the particular solution with free variables = 0 is xp = (7,0, — 3,0). 


| ee ARO 4 Tez le 4 1 2 0 —4 7 
002 8 -6;—>;0 0 1I 4 -3 | —> |0 01 4 -8 
0000 0 0000 0 0 0 0 O 0 


For the nullspace part xn with b = 0, set the free variables 72, x4 to 1, 0 and also 0, 1: 
Special solutions sı = (—2, 1, 0} @nd s.=(4, 0-4, 1) 


Then the complete solution to Ax = b (and Ra = d) is Lceomplete = Tp + Cray + C282. 
The rows of A produced the zero row from 2(row 1) + (row 2) — (row 3) = (0,0,0,0). 
Thus y = (2, 1, —1). The same combination for b = (4, 2, 10) gives 2(4)+ (2) —( 0) = 0. 
If a combination of the rows (on the left side) gives the zero row, then the same combi- 
nation must give zero on the right side. Of course! Otherwise no solution. 


Later we will say this again in different words: If every column of A is perpendicular 
to y = ( A, — 1),then any combination b of those columns must also be perpendicular to 
y. Otherwise b is not in the column space and Ax = b is not solvable. 

And again: If y is in the nullspace of AT then y must be perpendicular to every b in 
the column space of A. Just looking ahead... 


Problem Set 3.3 


1 (Recommended) Execute the six steps of Worked Example 3.3 A to describe the 
column space and nullspace of A and the complete solution to Ax = b: 


2 4 6 4 by 4 
w= | bo 6 b= | b |S 3 
273 8 2 bs 5 
2 Carry out the same six steps for this matrix A with rank one. You will find two 
conditions on b1, b2,b3 for Ax = b to be solvable. Together these two conditions 
put b into the space (two planes give a line): 
1 21 3 by 10 
Aa |e) FP Sole ge b=| b | =| 30 
2 4 2 6 b3 20 


Questions 3-15 are about the solution of Ax = b. Follow the steps in the text to £, 
and zn. Start from the augmented matrix with last column b. 


3 Write the complete solution as x, plus any multiple of s in the nullspace: 


x+ 3y+3z=1 
2x + 6y+9z=5 
—g — 3y + 3z = 5. 
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4 


10 


11 


Find the complete solution (also called the general solution) to 


i 3%, Oy a 1 
2 6 4 8||%l=|3 
0.0 2 4) {7 1 


Under what condition on 0, b2, bg is this system solvable? Include b as a fourth 
column in elimination. Find all solutions when that condition holds: 


xz + 2y —2z = bı 
2x + 5y — 4z = bz 
4z + 9y — 8z = bz. 


What conditions on b1, b2, b3, b4 make each system solvable? Find z in that case: 


1 2 by 1 2 3] ,, by 
2 4] far] _ | be 2 4 6 fa _ |b 
2 5 T2 g b3 2 5 T7 a b3 
3 9 b4 3: 9 2 i ba 


Show by elimination that (b1, b2, b3) is in the column space if b3 — 2b2 + 4b; = 0. 
1 3 1 
A=]|3 8 2 
2 4 0 
What combination of the rows of A gives the zero row? 


Which vectors (61, b2, b3) are in the column space of A? Which combinations of the 
rows of A give zero? 


12 1] 111 
(a) A=|2 6 3 b) A= |1 2 4 
0 2 5] 2 4 8 


(a) The Worked Example 3.3 A reached [U c]from[A b]. Put the multipliers 
into L and verify that LU equals A and Le equals b. 


(b) Combine the pivot columns of A with the numbers —9 and 3 in the particular 
solution x,. What is that linear combination and why? 


Construct a 2 by 3 system Aæ% = b with particular solution a, = (2,4,0) and 
homogeneous solution x, = any multiple of (1, 1, 1). 


Why can’t a 1 by 3 system have a, = (2,4,0) and x, = any multiple of (1, 1,1)? 
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13 


14 
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(a) If Ax = b has two solutions x; and x9, find two solutions to Ax = 0. 


(b) Then find another solution to Ax = O and another solution to Ax = b. 
Explain why these are all false: 


(a) The complete solution is any linear combination of x, and £n. 
(b) A system Ag = b has at most one particular solution. 


(c) The solution x, with all free variables zero is the shortest solution (minimum 
length ||x||). Find a 2 by 2 counterexample. 


(d) If A is invertible there is no solution æn in the nullspace. 


Suppose column 5 of U has no pivot. Then zs is a variable. The zero vector 
(is) (is not) the only solution to Ax = 0. If Ax = b has a solution, thenithas __—— 
solutions. 


Suppose row 3 of U has no pivot. Then that row is . The equation Ux = ce 
is only solvable provided . The equation Aæ = b (is) (is not) (might not be) 
solvable. 


Questions 16-20 are about matrices of “full rank” r = m orr =n. 


16 


17 


18 


19 


20 


The largest possible rank of a 3 by 5 matrix is . Then there is a pivot in every 
of U and R. The solution to Aæ = b (always exists) (is unique). The column 


space of A is . An example is A = 
The largest possible rank of a 6 by 4 matrix is . Then there is a pivot in 
every ____ of U and R. The solution to Ax = b (always exists) (is unique). The 


nullspace of A is . An example is A = 


Find by elimination the rank of A and also the rank of AT: 


1 4 0 1 0 1 
A=| 2 11 5| and A= |1 1 2] (rankdependson q). 
-1 2 10 11q 


Find the rank of A and also of AT A and also of AAT: 
2 0 
a= ; l sha’ Aala 
1 2 


Reduce A to its echelon form U. Then find a triangular L so that A = LU. 


1 0 1 0 
A and ».A =| 2° 2 0 3l. 
0 6 5 4 
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21 


22 


23 


24 


25 


Find the complete solution in the form x, + £n to these full rank systems: 


csty+z=4 


a)xz+y+z=4 b 
(a) y (b) ae | 


If Ax = b has infinitely many solutions, why is it impossible for Ax = B (new 
right side) to have only one solution? Could Ax = B have no solution? 


Choose the number q so that (if possible) the ranks are (a) 1, (b) 2, (c) 3: 


6 4 2 
A= |—3 -2 —1l and gaf 


Sa 
9 6 


q 2 q 


Give examples of matrices A for which the number of solutions to Ax = b is 


(a) Oor 1, depending on b 
(b) co, regardless of b 

(c) 0 or co, depending on b 
(d) 1, regardless of b. 


Write down all known relations between r and m and n if Ax = b has 


(a) no solution for some b 
(b) infinitely many solutions for every b 
(c) exactly one solution for some b, no solution for other b 


(d) exactly one solution for every b. 


Questions 26-33 are about Gauss-Jordan elimination (upwards as well as downwards) 
and the reduced echelon matrix R. 


26 


27 
28 


Continue elimination from U to R. Divide rows by pivots so the new pivots are all 1. 
Then produce zeros above those pivots to reach R: 
2 4 4 2 4 4 
=| 0. 3 6 and U= |O 3 6 
0 0 0 0 0 5 
If A is a triangular matrix, when is R = rref(A) equal to 7 ? 


Apply Gauss-Jordan elimination to Ux = 0 and Ux = ec. Reach Ra = O and 
Rz = d: 


u oj=|5 6 á ol =e W el=fo o 4 s) 


Solve Ræ = 0 to find æ, (its free variable is x2 = 1). Solve Ræ = d to find æ, (its 
free variable is z2 = 0). 
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29 


30 


31 


32 


33 
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Apply Gauss-Jordan elimination to reduce to Ra = 0 and Ra = d: 
3.0 6 O 3.0 6 9 
U Oj =|]0 0 2 0 and U e| =|0 0 2 4 
0 0 0 0 0 0 0 5 
Solve Ux = 0 or Ra = O to find x, (free variable = 1). What are the solutions to 


Ra = d? 


Reduce to Ux = c (Gaussian elimination) and then Ra = d (Gauss-Jordan): 


E va ae ee 2 
Ae=|1 3 2 oj |7?} =| 5|=b 
204 9|] 10 

T4 


Find a particular solution x, and all homogeneous solutions £n. 


Find matrices A and B with the given property or explain why you can’t: 


1 
(a) The only solution of Ax = | 2 | isx= | a | 


3 
1 
(b) The only solution of Ba = | 1 | 1S!) 2 
3 


and b= andthen b= 


=. DOF re 
e eb Ww 
O We 
mO We 
ooor 


The complete solution to Ax = ; | is £ = 5 | + e| x i Find A. 


aa 


satanic acca naam aaa eR eo germ ergeemaesn 
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36 
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Challenge Problems 


(Recommended!) Suppose you know that the 3 by 4 matrix A has the vector s = 
(2, 3, 1,0) as the only special solution to Ax = 0. 

(a) What is the rank of A and the complete solution to Ax = 0? 

(b) What is the exact row reduced echelon form R of A? 

(c) How do you know that Aw = b can be solved for all b ? 
Suppose K is the 9 by 9 second difference matrix (2’s on the diagonal, —1’s on 
the diagonal above and also below). Solve the equation Ka = b = (10,..., 10). 


If you graph z1, ..., £g above the points 1, ... , 9 on the x axis, I think the nine points 
fall on a parabola. 


Suppose Ax = b and Ca = b have the same (complete) solutions for every b. 
Is it true that A equals C ? 


Describe the column space of a reduced row echelon matrix R. 
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1 Independent columns of A: The only solution to Ax = 0 is x = 0. The nullspace is Z. 

2 Independent vectors: The only zero combination c;v; + +--+ CkUk = 0 has all c’s = 0. 

3 A matrix with m < n has dependent columns: At least n—m free variables/ special solutions. 
4 The vectors vı, ..., Ux span the space S if S = all combinations of the v’s. 

5 The vectors v1,...,U% are a basis for S if they are independent and they span S. 


6 The dimension of a space S is the number of vectors in every basis for S. 


7 If Ais 4 by 4 and invertible, its columns are a basis for Rt. The dimension of R? is 4. 


This important section is about the true size of a subspace. There are n columns in an 
m by n matrix. But the true “dimension” of the column space is not necessarily n. The 
dimension is measured by counting independent columns—and we have to say what that 
means. We will see that the true dimension of the column space is the rank r. 

The idea of independence applies to any vectors v1, ...,Un in any vector space. Most 
of this section concentrates on the subspaces that we know and use—especially the col- 
umn space and the nullspace of A. In the last part we also study “vectors” that are not 
column vectors. They can be matrices and functions; they can be linearly independent (or 
dependent). First come the key examples using column vectors. 

The goal is to understand a basis: independent vectors that “span the space”. 


Every vector in the space is a unique combination of the basis vectors. 


We are at the heart of our subject, and we cannot go on without a basis. The four essential 
ideas in this section (with first hints at their meaning) are: 


1. Independent vectors (no extra vectors) 

2. Spanning a space (enough vectors to produce the rest) 
3. Basis for a space (not too many or too few) 

4. Dimension of a space (the number of vectors in a basis) 


Linear Independence 


Our first definition of independence is not so conventional, but you are ready for it. 


DEFINITION The columns of A are linearly independent when the only solution to 
Aa = 0 is x = 0. No other combination Az of the columns gives the zero vector. 
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The columns are independent when the nullspace N (A) contains only the zero vector. 
Let me illustrate linear independence (and dependence) with three vectors in R°: 


1. If three vectors are not in the same plane, they are independent. No combination of 
V1, V2, V3 in Figure 3.4 gives zero except 0v1 + 0v2 + 0v3. 


2. If three vectors w1, w2, wg are in the same plane, they are dependent. 


vı 
In a plane 
Not in 0 
a plane v2 aa w3 
U3 Wi — W2 


Figure 3.4: Independent vectors v1, v2, v3. Only 0v1 + Ove + Ov3 gives the vector 0. 
Dependent vectors w1, W2, w3. The combination wı — w2 + w3 is (0, 0,0). 


This idea of independence applies to 7 vectors in 12-dimensional space. If they are the 
columns of A, and independent, the nullspace only contains x = 0. None of the vectors is 
a combination of the other six vectors. 

Now we choose different words to express the same idea. The following definition of 
independence will apply to any sequence of vectors in any vector space. When the vectors 
are the columns of A, the two definitions say exactly the same thing. 


DEFINITION The sequence of vectors ¥1,...,Un is linearly independent if the only 
combination that gives the zero vector is 0v1 + 0v2 +-:-+0vn. 


Linear independence 
L1V1, + T2V2 +: +£LnUVn =0 only happens when all z’s are zero. 


(1) 


If a combination gives 0, when the x’s are not all zero, the vectors are dependent. 

Correct language: “The sequence of vectors is linearly independent.” Acceptable 
shortcut: “The vectors are independent.” Unacceptable: “The matrix is independent.” 

A sequence of vectors is either dependent or independent. They can be combined to 
give the zero vector (with nonzero x’s) or they can’t. So the key question is: Which com- 
binations of the vectors give zero? We begin with some small examples in R°: 


(a) The vectors (1, 0) and (0, 1) are independent. 

(b) The vectors (1, 0) and (1, 0.00001) are independent. 

(c) The vectors (1, 1) and (—1, —1) are dependent. 

(d) The vectors (1,1) and (0,0) are dependent because of the zero vector. 


(e) In R?, any three vectors (a,b) and (c, d) and (e, f) are dependent. 
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Geometrically, (1,1) and (—1, —1) are on a line through the origin. They are dependent. 
To use the definition, find numbers zı and x2 so that xı(1,1) + z2(—1,—1) = (0,0). 
This is the same as solving Ax = 0: 


1 —1 Si 0 me ve 
i i =| = fol foray = 1 and zz = 1: 


The columns are dependent exactly when there is a nonzero vector in the nullspace. 
If one of the v’s is the zero vector, independence has no chance. Why not? 


Three vectors in R? cannot be independent! One way to see this: the matrix A with 
those three columns must have a free variable and then a special solution to Ax = O. 
Another way: If the first two vectors are independent, some combination will produce the 
third vector. See the second highlight below. 

Now move to three vectors in R®. If one of them is a multiple of another one, these 
vectors are dependent. But the complete test involves all three vectors at once. We put 
them in a matrix and try to solve Ax = 0. 


Example 1 The columns of this A are dependent. Ax = 0 has a nonzero solution: 


1 0 3] |-3 1 0 3 0 
Aw=|2 1 5 1 is =—3j}2|+1/1) +175) = 10 
1 0 3 1 1 0 3 0 


The rank is only r = 2. Independent columns produce full column rank r = n = 3. 
In that matrix the rows are also dependent. Row 1 minus row 3 is the zero row. For a 
square matrix, we will show that dependent columns imply dependent rows. 


Question How to find that solution to Ax = 0? The systematic way is elimination. 


i ie es io) 3 
A=1]2 1 5] reducestoR=|0 1 —1 
POF Gi 8) 


The solution æ = (—3, 1, 1) was exactly the special solution. It shows how the free column 
(column 3) is a combination of the pivot columns. That kills independence! 


Full column rank The columns of A are independent exactly when the rank is r = n. 
There are n pivots and no free variables. Only x = O is in the nullspace. 


One case is of special importance because it is clear from the start. Suppose seven 
columns have five components each (m = 5 is less than n = 7). Then the columns must 
be dependent. Any seven vectors from RË are dependent. The rank of A cannot be larger 
than 5. There cannot be more than five pivots in five rows. Ax = 0 has at least 7 — 5 = 2 
free variables, so it has nonzero solutions—which means that the columns are dependent. 


Any set of 7 vectors in R™ must Be linearly dependent if n > m. 
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This type of matrix has more columns than rows—it is short and wide. The columns are 
certainly dependent if n > m, because Ax = 0 has a nonzero solution. 

The columns might be dependent or might be independent if n < m. Elimination will 
reveal the r pivot columns. /t is those r pivot columns that are independent. 


Note Another way to describe linear dependence is this: “One vector is a combination 
of the other vectors.” That sounds clear. Why don’t we say this from the start? Our 
definition was longer: “Some combination gives the zero vector, other than the trivial 
combination with every x = 0.” We must rule out the easy way to get the zero vector. 
That trivial combination of zeros gives every author a headache. If one vector is a combi- 
nation of the others, that vector has coefficient z = 1. 

The point is, our definition doesn’t pick out one particular vector as guilty. All columns 
of A are treated the same. We look at Ax = 0, and it has a nonzero solution or it hasn’t. In 
the end that is better than asking if the last column (or the first, or a column in the middle) 
is a combination of the others. 


Vectors that Span a Subspace 


The first subspace in this book was the column space. Starting with columns vj,...,Un, 
the subspace was filled out by including all combinations 7; v1 + +--+ £nVn. The column 
space consists of all combinations Ax of the columns. We now introduce the single word 
“span” to describe this: The column space is spanned by the columns. 


DEFINITION A set of vectors spans a space if their linear combinations fill the space. 
The columns of a matrix span its column space. They might be dependent. 


Example2 vi = o and v2 = H span the full two-dimensional space R°. 


Example 3 vi = fol Uz = H Ug = [D also span the full space R°. 


Example 4 w= H and Ws = a only span a line in R°. So does w; by itself. 


Think of two vectors coming out from (0, 0, 0) in 3-dimensional space. Generally they 
span a plane. Your mind fills in that plane by taking linear combinations. Mathematically 
you know other possibilities: two vectors could span a line, three vectors could span all of 
R, or only a plane. It is even possible that three vectors span only a line, or ten vectors 
span only a plane. They are certainly not independent! 

The columns span the column space. Here is a new subspace—which is spanned by the 
rows. The combinations of the rows produce the “row space”. 
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DEFINITION The row space of a matrix is the subspace of R” spanned by the rows. 
The row space of Ais C(A‘). It is the column space of A’. 


The rows of an m by n matrix have n components. They are vectors in R"—or they 
would be if they were written as column vectors. There is a quick way to fix that: Transpose 
the matrix. Instead of the rows of A, look at the columns of A?. Same numbers, but now 
in the column space C( AT). This row space of A is a subspace of R”. 


Example 5 Describe the column space and the row space of A. 


l 4 Le 3 
A=]|2 7| and AT = . Here m = 3 and n = 2. 
3 5 4 Tp 


The column space of A is the plane in R® spanned by the two columns of A. The row 
space of A is spanned by the three rows of A (which are columns of AT). This row space 
is all of R?. Remember: The rows are in R” spanning the row space. The columns are in 
R” spanning the column space. Same numbers, different vectors, different spaces. 


A Basis for a Vector Space 
Two vectors can’t span all of R°, even if they are independent. Four vectors can’t be 
independent, even if they span R3. We want enough independent vectors to span the 
space (and not more). A “basis” is just right. 


DEFINITION A basis for a vector space is a sequence of vectors with two properties: 


The basis vectors are linearly independent and they span the space. 


This combination of properties is fundamental to linear algebra. Every vector v in the space 
is a combination of the basis vectors, because they span the space. More than that, the com- 
bination that produces v is unique, because the basis vectors U1,...,U,, are independent: 


There is one and only one way to write v as a combination of the basis vectors. 


Reason: Suppose v = a1v1 +`: :+4anVn and also v = b1v1 +: -+bnUn. By subtraction 
(a1 — b1)v1, +: +--+ (an — bn) Un is the zero vector. From the independence of the v’s, each 
a; — b; = 0. Hence a; = b;, and there are not two ways to produce v. 


Example 6 The columns of J = f Hl produce the “standard basis” for R°. 
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The basis vectors 2= f and j= H are independent. They span R°. 


Everybody thinks of this basis first. The vector 2 goes across and 7 goes straight up. The 
columns of the 3 by 3 identity matrix are the standard basis 2, 7, k. The columns of the n 
by n identity matrix give the “standard basis” for R”. 

Now we find many other bases (infinitely many). The basis is not unique! 


Example 7 (Important) The columns of every invertible n by n matrix give a basis for R”: 


Invertible matrix 1 0 0 Singular matrix BEOS 
Independent columns A= |1 1 0 Dependent columns B= |1 1 2 
Column space is Rê 1 1 1 Column space 4 Rê 1d 2 


The only solution to Ax = 0 is x = A~'0 = O. The columns are independent. They span 
the whole space R”—because every vector b is a combination of the columns. Ax = b can 
always be solved by 2 = A~‘b. Do you see how everything comes together for invertible 
matrices? Here it is in one sentence: 


The vectors v1,..., Un are a basis for R” exactly when they are the columns of an n by 
n invertible matrix. Thus R” has infinitely many different bases. 


When the columns are dependent, we keep only the pivot columns—the first two columns 
of B above, with its two pivots. They are independent and they span the column space. 


The pivot columns of A are a basis for its column space. The pivot rows of A are a basis 
for its row space. So are the pivot rows of its echelon form R. 


Example 8 This matrix is not invertible. Its columns are not a basis for anything! 


One pivot column _|2 4 _]l 2 
One pivot row (r = 1) aS - J rodu peio i= t o i 


Column 1 of A is the pivot column. That column alone is a basis for its column space. 
The second column of A would be a different basis. So would any nonzero multiple of that 
column. There is no shortage of bases. One definite choice is the pivot columns. 

Notice that the pivot column (1,0) of this R ends in zero. That column is a basis for 
the column space of R, but it doesn’t belong to the column space of A. The column spaces 
of A and R are different. Their bases are different. (Their dimensions are the same.) 

The row space of A is the same as the row space of R. It contains (2, 4) and (1, 2) and 
all other multiples of those vectors. As always, there are infinitely many bases to choose 
from. One natural choice is to pick the nonzero rows of R (rows with a pivot). So this 
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matrix A with rank one has only one vector in the basis: 
: 2 : 1 
Basis for the column space: l . Basis for the row space: | i 


The next chapter will come back to these bases for the column space and row space. We 
are happy first with examples where the situation is clear (and the idea of a basis is still 
new). The next example is larger but still clear. 


Example 9 Find bases for the column and row spaces of this rank two matrix: 


Columns 1 and 3 are the pivot columns. They are a basis for the column space (of R!). The 
vectors in that column space all have the form b = (x, y, 0). The column space of R is the 
“zy plane” inside the full 3-dimensional xyz space. That plane is not R’, itis a subspace of 
R. Columns 2 and 3 are also a basis for the same column space. Which pairs of columns 
of R are not a basis for its column space? 

The row space of R is a subspace of R*. The simplest basis for that row space is the 
two nonzero rows of R. The third row (the zero vector) is in the row space too. But it is 
not in a basis for the row space. The basis vectors must be independent. 


Question Given five vectors in R”, how do you find a basis for the space they span? 


First answer Make them the rows of A, and eliminate to find the nonzero rows of R. 
Second answer Put the five vectors into the columns of A. Eliminate to find the pivot 
columns (of A not R). Those pivot columns are a basis for the column space. 

Could another basis have more vectors, or fewer? This is a crucial question with a good 
answer: No. All bases for a vector space contain the same number of vectors. 


The number of vectors, in any and every basis, is the “dimension” of the space. 


Dimension of a Vector Space 


We have to prove what was just stated. There are many choices for the basis vectors, but 
the number of basis vectors doesn’t change. 


If v1,...,Um and wy 1,..., Wn are both bases for the same vector space, then m = n. 


Proof Suppose that there are more w’s than v’s. From n > m we want to reach a 
contradiction. The v’s are a basis, so w, must be a combination of the v’s. If w, equals 
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a11V1 +-+:+@m1Um, this is the first column of a matrix multiplication V A: 


Each w is a a11 Qin 
combination W = |wi wo... Wn | = | V1 ... Um : : = VA. 
of the v’s 

Am1 Amn 


We don’t know each a;;, but we know the shape of A (it is m by n). The second vector 
wə is also a combination of the v’s. The coefficients in that combination fill the second 
column of A. The key is that A has a row for every v and a column for every w. Aisa 
short wide matrix, since we assumed n > m. So Ax = 0 has a nonzero solution. 

Ax = 0 gives VAx = 0 whichis Wa = 0. A combination of the w’s gives zero! 
Then the w’s could not be a basis—our assumption n > m is not possible for two bases. 

If m > n we exchange the v’s and w’s and repeat the same steps. The only way to 
avoid a contradiction is to have m = n. This completes the proof that m = n. 


The number of basis vectors depends on the space—not on a particular basis. The 
number is the same for every basis, and it counts the “degrees of freedom” in the space. 
The dimension of the space R” is n. We now introduce the important word dimension 
for other vector spaces too. 


DEFINITION The dimension of a space is the number of vectors in every basis. 


This matches our intuition. The line through v = (1, 5,2) has dimension one. It is a sub- 
space with this one vector v in its basis. Perpendicular to that line is the plane 
x + 5y + 2z = 0. This plane has dimension 2. To prove it, we find a basis (—5, 1,0) 
and (—2,0,1). The dimension is 2 because the basis contains two vectors. 

The plane is the nullspace of the matrix A = [ 15 2 | , which has two free variables. 
Our basis vectors (—5,1,0) and (—2,0,1) are the “special solutions” to Ax = 0. The 
next section shows that the n — r special solutions always give a basis for the nullspace. 
C(A) has dimension r and the nullspace N (A) has dimension n — r. 


Note about the language of linear algebra We never say “the rank of a space” or “the 
dimension of a basis” or “the basis of a matrix”. Those terms have no meaning. It is the 
dimension of the column space that equals the rank of the matrix. 


Bases for Matrix Spaces and Function Spaces 


The words “independence” and “basis” and “dimension” are not at all restricted to column 
vectors. We can ask whether three matrices A1, A2, A3 are independent. When they are in 
the space of all 3 by 4 matrices, some combination might give the zero matrix. We can also 
ask the dimension of the full 3 by 4 matrix space. (It is 12.) 

In differential equations, d?1/dx? = y has a space of solutions. One basis is y = e? 
and y = e *. Counting the basis functions gives the dimension 2 for the space of all 
solutions. (The dimension is 2 because of the second derivative.) 
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Matrix spaces and function spaces may look a little strange after R”. But in some 
way, you haven’t got the ideas of basis and dimension straight until you can apply them to 
“vectors” other than column vectors. 


Matrix spaces The vector space M contains all 2 by 2 matrices. Its dimension is 4. 


One basisis A1, A2, A3, Ag = o o l F 5 É 5 : F l l 


Those matrices are linearly independent. We are not looking at their columns, but at the 
whole matrix. Combinations of those four matrices can produce any matrix in M, so they 
span the space: 


Every A combines 


_ |e a _ 
the basis matrices c1Ai + cae + cag + cals = ie | =e 


A is zero only if the c’s are all zero—this proves independence of A;, As, A3, Aa. 

The three matrices A1, A2, A4 are a basis for a subspace—the upper triangular 
matrices. Its dimension is 3. A, and A; are a basis for the diagonal matrices. What is 
a basis for the symmetric matrices? Keep A; and Ag, and throw in Ag + A3. 

To push this further, think about the space of all n by n matrices. One possible basis 
uses matrices that have only a single nonzero entry (that entry is 1). There are n? positions 
for that 1, so there are n? basis matrices: 


The dimension of the whole n by n matrix space is n?. 
The dimension of the subspace of upper triangular matrices is $n” + $n. 
The dimension of the subspace of diagonal matrices is n. 


The dimension of the subspace of symmetric matrices is $n? + ¿n (why 2). 


Function spaces The equations d?y/dx? = 0 and d?y/dx? = —y and d*y/dz? = y 
involve the second derivative. In calculus we solve to find the functions y(x): 


y =0 is solved by any linear function y = cx + d 
y” = —y is solved by any combination y = csin z + d cos x 
y” =y is solved by any combination y = ce? + de~”. 


That solution space for y” = —y has two basis functions: sin x and cosg. The space 
for y” = 0 has z and 1. Itis the “nullspace” of the second derivative! The dimension is 2 
in each case (these are second-order equations). 

The solutions of y” = 2 don’t form a subspace—the right side b = 2 is not zero. A 
particular solution is y(x) = x?. The complete solution is y(x) = z? + cx +d. All 
those functions satisfy y” = 2. Notice the particular solution plus any function cx + d 
in the nullspace. A linear differential equation is like a linear matrix equation Ax = b. 


But we solve it by calculus instead of linear algebra. 


3.4. Independence, Basis and Dimension 173 


We end here with the space Z that contains only the zero vector. The dimension of this 
space is zero. The empty set (containing no vectors) is a basis for Z. We can never allow 
the zero vector into a basis, because then linear independence is lost. 


= REVIEW OF THE KEY IDEAS = 


1. The columns of A are independent if x = 0 is the only solution to Ax = 0. 
2. The vectors v;,...,U, span a space if their combinations fill that space. 


3. A basis consists of linearly independent vectors that span the space. Every vector 
in the space is a unique combination of the basis vectors. 


4. All bases for a space have the same number of vectors. This number of vectors in a 
basis is the dimension of the space. 


5. The pivot columns are one basis for the column space. The dimension is r. 


™ WORKED EXAMPLES = 


3.4 A Start with the vectors vı = (1,2,0) and vg = (2,3,0). (a) Are they linearly 
independent? (b) Are they a basis for any space? (c) What space V do they span? 
(d) What is the dimension of V? (e) Which matrices A have V as their column space? 
(£) Which matrices have V as their nullspace? (g) Describe all vectors v3 that complete 
a basis V1, V2, V3 for R3. 


Solution 
(a) vı and v2 are independent—the only combination to give 0 is Ov; + 0v2. 
(b) Yes, they are a basis for the space they span. 
(c) That space V contains all vectors (x, y, 0). It is the xy plane in R3. 
(d) The dimension of V is 2 since the basis contains two vectors. 


(e) This V is the column space of any 3 by n matrix A of rank 2, if every column is a 
combination of vı and v2. In particular A could just have columns v and v2. 


(f) This V is the nullspace of any m by 3 matrix B of rank 1, if every row is a multiple 
of (0,0, 1). In particular take B = [0 0 1]. Then Bvı = 0 and Buz = 0. 


(g) Any third vector v3 = (a, b, c) will complete a basis for R? provided c Æ 0. 
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3.4 B Start with three independent vectors w1, w2, w3. Take combinations of those 
vectors to produce v1, V2, v3. Write the combinations in matrix form as V = W B: 


v =Z wi + We 1 1 0 
V2 = W1 +2Ww2 + w3 whichis V1 V2 V3) = [w1 We W3 1 2 1 
U3 = W + CW3 0 1 c 


What is the test on B to see if V = WB has independent columns? If c # 1 show 
that v1, V2, v3 are linearly independent. If c = 1 show that the v’s are linearly dependent. 


Solution The test on V for independence of its columns was in our first definition: 
The nullspace of V must contain only the zero vector. Then x = (0,0,0) is the only 
combination of the columns that gives Væ = zero vector. 

If c = 1 in our problem, we can see dependence in two ways. First, vı + v3 will be 
the same as və. (If you add wı + w2 to w2 + w3 you get wı + 2w2 + w3 which is v2.) 
In other words vı — v2 + v3 = O—which says that the v’s are not independent. 

The other way is to look at the nullspace of B. If c = 1, the vector æ = (1, —1, 1) isin 
that nullspace, and Ba = 0. Then certainly W Ba = O which is the same as Vaz = 0. So 
the v’s are dependent. This specific x = (1, —1, 1) from the nullspace tells us again that 
vı — Vo +03 = 0. 

Now suppose c # 1. Then the matrix B is invertible. So if x is any nonzero vector we 
know that Ba is nonzero. Since the w’s are given as independent, we further know that 
W Bz is nonzero. Since V = W B, this says that a is not in the nullspace of V. In other 
words v1, U2, V3 are independent. 

The general rule is “independent v’s from independent w’s when B is invertible”. 
And if these vectors are in RÌ, they are not only independent—they are a basis for R3. 
“Basis of v’s from basis of w’s when the change of basis matrix B is invertible.” 


3.4C (Important example) Suppose v1,...,Un is a basis for R” and the n by n matrix 
A is invertible. Show that Av 1,..., Avy is also a basis for R”. 


Solution In matrix language: Put the basis vectors v1,...,Un in the columns of an 
invertible(!) matrix V. Then Av,,..., Avy are the columns of AV. Since A is invertible, 
so is AV and its columns give a basis. 

In vector language: Suppose c1 Av; + --:+cnAvn = 0. This is Av = 0 with 
v = GV1 ++ +-+enVn. Multiply by AT! to reach v = 0. By linear independence of the v’s, 
all c; = 0. This shows that the Av’s are independent. 

To show that the Av’s span R”, solve c1 Av; +---+c¢,Av, = b which is the same as 
C1V1 + +++ + CnUn = AT tb. Since the v’s are a basis, this must be solvable. 
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Problem Set 3.4 


Questions 1-10 are about linear independence and linear dependence. 


1 Show that v1, v2, v3 are independent but v1, v2, v3, v4 are dependent: 
1 1 1 2 
U1 = 0 V2 = 1 U3 = 1 V4 = 3 
0 0 1 4 


Solve c1V1 + C2V2 + C3V3 + c4v4 = 0 or Ax = 0. The v’s go in the columns of A. 


2 (Recommended) Find the largest possible number of independent vectors among 
1 i 1 0 0 0 

vı = =i V2 = 1 v3 = n v4 = ! v5 = l ve = 0 

Lo 0 2=> =ł a 0 4— as | 5 — 0 6. = 1 

0 0 =I 0 —=1 —1 


3 Prove that if a = 0 ord = 0 or f = 0 (3 cases), the columns of U are dependent: 


a U0 ¢ 
U = 0 d e 
00 f 
4 If a,d, f in Question 3 are all nonzero, show that the only solution to Ux = O is 


a = 0. Then the upper triangular U has independent columns. 
5 Decide the dependence or independence of 


(a) the vectors (1,3,2) and (2,1,3) and (3,2, 1) 
(b) the vectors (1, —3, 2) and (2, 1, —3) and (—3, 2, 1). 


6 Choose three independent columns of U. Then make two other choices. Do the same 
for A. 
a oe oe | 23 Aal 
Or 0 0670 
Ualio Go o A Aaa 
0000 4 6 & 2 
7 If w 1, w2, w3 are independent vectors, show that the differences vı = w2 — wg and 
V2 = W1 — w3 and v3 = W 1 — We are dependent. Find a combination of the v’s 
that gives zero. Which matrix A in [| vı v2 v3|=[wi we ws ] Ais singular? 
8 If w 1, w2, w3 are independent vectors, show that the sums vı = w2 + w3 and 


V2 = W1+w3 and v3 = wı + We are independent. (Write c,v1 + c2V2+c3v3 = 0 
in terms of the w’s. Find and solve equations for the c’s, to show they are zero.) 
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9 Suppose v1, V2, V3, V4 are vectors in R3. 
(a) These four vectors are dependent because ____ 
(b) The two vectors vı and v2 will be dependent if «x. 
(c) The vectors v; and (0, 0,0) are dependent because __ 


10 Find two independent vectors on the plane x+ 2y—3z—t = 0 in R4. Then find three 
independent vectors. Why not four? This plane is the nullspace of what matrix? 


Questions 11-14 are about the space spanned by a set of vectors. Take all linear com- 
binations of the vectors. 


11 Describe the subspace of R? (is it a line or plane or R??) spanned by 
(a) the two vectors (1,1, —1) and (—1, —1, 1) 
(b) the three vectors (0, 1,1) and (1, 1,0) and (0, 0, 0) 
(c) all vectors in R? with whole number components 


(d) all vectors with positive components. 


12 The vector b is in the subspace spanned by the columns of A when has a 
solution. The vector c is in the row space of A when has a solution. 


True or false: If the zero vector is in the row space, the rows are dependent. 


13 Find the dimensions of these 4 spaces. Which two of the spaces are the same? (a) col- 
umn space of A, (b) column space of U, (c) row space of A, (d) row space of U: 


1 1 0 1 1 0 
A=. 3 1 and U= 10 2 1 
3 1 =] 00 0 


14 v+w and v — w are combinations of v and w. Write v and w as combinations of 
v + w and v — w. The two pairs of vectors ____ the same space. When are they a 
basis for the same space? 


Questions 15-25 are about the requirements for a basis. 


15 If v1,...,Un are linearly independent, the space they span has dimension 
These vectors are a for that space. If the vectors are the columns of an m by 
n matrix, then m is than n. If m = n, that matrix is 


16 Find a basis for each of these subspaces of Rf: 


(a) All vectors whose components are equal. 

(b) All vectors whose components add to zero. 

(c) All vectors that are perpendicular to (1, 1,0, 0) and (1,0, 1,1). 
(d) The column space and the nullspace of I (4 by 4). 


os eae 
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17 


18 


19 


20 


21 


22 


23 


24 


Find three different bases for the column space of U = [4949]. Then find two 
different bases for the row space of U. 


Suppose v1, V2,..., Ve are six vectors in RÉ. 


(a) Those vectors (do)(do not)(might not) span RÅ. 
(b) Those vectors (are)(are not)(might be) linearly independent. 


(c) Any four of those vectors (are)(are not)(might be) a basis for R. 


The columns of A are n vectors from R”. If they are linearly independent, what is 
the rank of A? If they span R”, what is the rank? If they are a basis for R™, what 
then? Looking ahead: The rank r counts the number of columns. 


Find a basis for the plane x — 2y +3z = Oin R. Then find a basis for the intersection 
of that plane with the xy plane. Then find a basis for all vectors perpendicular to the 
plane. 


Suppose the columns of a 5 by 5 matrix A are a basis for R5. 


(a) The equation Ax = 0 has only the solution x = O because 


(b) If bis in RË then Ax = bis solvable because the basis vectors R°. 
Conclusion: A is invertible. Its rank is 5. Its rows are also a basis for RŠ. 
Suppose S is a 5-dimensional subspace of Rê. True or false (example if false): 


(a) Every basis for S can be extended to a basis for Rê by adding one more vector. 


(b) Every basis for Rê can be reduced to a basis for S by removing one vector. 


U comes from A by subtracting row 1 from row 3: 
1 3 2 1 3 2 
A=]ļ|0 1 1 and U= |0 1 1 
1 3 2 0 0 0 
Find bases for the two column spaces. Find bases for the two row spaces. Find bases 
for the two nullspaces. Which spaces stay fixed in elimination? 


True or false (give a good reason): 


(a) If the columns of a matrix are dependent, so are the rows. 
(b) The column space of a 2 by 2 matrix is the same as its row space. 
(c) The column space of a 2 by 2 matrix has the same dimension as its row space. 


(d) The columns of a matrix are a basis for the column space. 
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25 For which numbers c and d do these matrices have rank 2? 


De De Ce 0" 5 "E 
A=]10 0 c 2 2 and Ba | 
000 a2 z 


Questions 26-30 are about spaces where the “vectors” are matrices. 

26 Find a basis (and the dimension) for each of these subspaces of 3 by 3 matrices: 
(a) All diagonal matrices. 
(b) All symmetric matrices (AT = A). 


(c) All skew-symmetric matrices (AT = —A). 
27 Construct six linearly independent 3 by 3 echelon matrices U1, .. . , U6. 


28 Find a basis for the space of all 2 by 3 matrices whose columns add to zero. Find a 
basis for the subspace whose rows also add to zero. 


29 What subspace of 3 by 3 matrices is spanned (take all combinations) by 


(a) the invertible matrices? 
(b) the rank one matrices? 
(c) the identity matrix? 
30 Find a basis for the space of 2 by 3 matrices whose nullspace contains (2, 1, 1). 


Questions 31-35 are about spaces where the “vectors” are functions. 
31 (a) Find all functions that satisfy dy = 0. 
(b) Choose a particular function that satisfies ou =: 


(c) Find all functions that satisfy dy = 


32 The cosine space F3 contains all combinations y(x) = A cos x+B cos 2z+C cos 3x. 
Find a basis for the subspace with y(0) = 0. 


33 Find a basis for the space of functions that satisfy 


(a) GH — 2y = 
b) ## —2=0, 


34 Suppose yi(x), yo(x), y3(x) are three different functions of x. The vector space they 
span could have dimension 1, 2, or 3. Give an example of y1, y2, y3 to show each 
possibility. 


35 Find a basis for the space of polynomials p(x) of degree < 3. Find a basis for the 
subspace with p(1) = 0. 


36 Find a basis for the space S of vectors (a, b, c, d) with a + c+ d = 0 and also for the 
space T with a +b = 0 and c = 2d. What is the dimension of the intersection SN T? 
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37 


38 


39 


40 


41 


42 


43 


44 


If AS = SA for the shijt matrix S, show that A must have this special form: 


@ 0c) l0 170 O10) la -bre a&b c 
If id e- f| J0 0 1p&0 0 1) id e J| then A= |0 a b 
GRT 0 070 OO) Wags h 2 0 0 a 


or) 


“The subspace of matrices that commute with the shift S has dimension 
Which of the following are bases for R?? 


(a) (1, 29) and (0, 1, —1) 

(b) (1,1, —-1), ( 2,3, 4), (4,1, -1), (0,1, -1) 
(c) (1,2, 2),( —1,2, 1), (0,8, 0) 

(d) (12,2), ( —1,2, 1), (0, 8,6) 


Suppose A is 5 by 4 with rank 4. Show that Ax = b has no solution when the 5 by 5 
matrix | A b] is invertible. Show that Ax = bis solvable when [A b] is singular. 


(a) Find a basis for all solutions to d*y/dz* = y(x). 
(b) Find a particular solution to d*y/dx* = y(x) + 1. Find the complete solution. 


Challenge Problems 


Write the 3 by 3 identity matrix as a combination of the other five permutation 
matrices! Then show that those five matrices are linearly independent. (Assume a 
combination gives c; P; +--+ + cs P; = zero matrix, and check entries to prove that 
Cı to cg must all be zero.) The five permutations are a basis for the subspace of 3 by 
3 matrices with row and column sums all equal. 


Choose x = (z1, £2, £3, £4) in R*. It has 24 rearrangements like (£2, £1, Z3, £4) 
and (x4, £3, £1, £2). Those 24 vectors, including æ itself, span a subspace S. Find 
specific vectors æ so that the dimension of S is: (a) zero, (b) one, (c) three, (d) four. 


Intersections and sums have dim(V) + dim(W) = dim(V N W) + dim(V + W). 
Start with a basis u1, ..., ur for the intersection V N W. Extend with v1,...,Us to 
a basis for V, and separately with w1, ..., w+ to a basis for W. Prove that the w’s, 
v’s and w’s together are independent. The dimensions have (r + s) + (r + t) = 
(r) + (r + s+ t) as desired. 


Mike Artin suggested a neat higher-level proof of that dimension formula in Prob- 
lem 43. From all inputs v in V and w in W, the “sum transformation” produces 
v + w. Those outputs fill the space V + W. The nullspace contains all pairs v = u, 
w = —u for vectors u in V N W. (Then v + w = u — u = 0.) So dim(V + W) + 
dim(V N W) equals dim(V) + dim(W) (input dimension from V and W) by the 
Counting Theorem. 


dimension of outputs + dimension of nullspace = dimension of inputs. 


Problem For an m by n matrix of rank r, what are those 3 dimensions? Outputs = 
column space. This question will be answered in Section 3.5, can you do it now? 
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Inside R”, suppose dimension (V) + dimension (W) > n. Show that some nonzero 
vector is in both V and W. 


Suppose A is 10 by 10 and A? = 0 (zero matrix). So A multiplies each column of 
A to give the zero vector. This means that the column space of A is contained in the 
. If A has rank r, those subspaces have dimension r < 10 — r. So the rank is 


r<5. 


3.5. Dimensions of the Four Subspaces 181 


3.5 Dimensions of the Four Subspaces 


1 The column space C(A) and the row space C (AT) both have dimension r (the rank of A). 


2 The nullspace N (A) has dimension n — r. The left nullspace N (AT) has dimension m — r. 


3 Elimination produces bases for the row space and nullspace of A: They are the same as for R. 
4 Elimination often changes the column space and left nullspace (but dimensions don’t change). 


5 Rank one matrices: A = wv? = column times row: C(A) has basis u, C(A‘) has basis v. 


The main theorem in this chapter connects rank and dimension. The rank of a matrix 
is the number of pivots. The dimension of a subspace is the number of vectors in a basis. 
We count pivots or we count basis vectors. The rank of A reveals the dimensions of 
all four fundamental subspaces. Here are the subspaces, including the new one. 

Two subspaces come directly from A, and the other two from A’: 


Four Fundamental Subspaces 
1. The row space is C( AT), a subspace of R”. 


. The nullspace is N (A), a subspace of R”. 


a subspace of R™. 


. The column space is i 


. The left nullspace is N (AT), a subspace of R™. This is our new space. 


In this book the column space and nullspace came first. We know C(A) and N (A) pretty 
well. Now the other two subspaces come forward. The row space contains all combinations 
of the rows. This row space of A is the column space of A’. 

For the left nullspace we solve AT y = O—that system is n by m. This is the nullspace 
of AT. The vectors y go on the left side of A when the equation is written yT A = OT. 
The matrices A and A? are usually different. So are their column spaces and their nullspaces. 
But those spaces are connected in an absolutely beautiful way. 

Part 1 of the Fundamental Theorem finds the dimensions of the four subspaces. One fact 
stands out: The row space and column space have the same dimension r. This number r 
is the rank of the matrix. The other important fact involves the two nullspaces: 


N(A) and N(A‘) have dimensions n — r and m — r, to make up the full n and m. 


Part 2 of the Fundamental Theorem will describe how the four subspaces fit together 
(two in R” and two in R™). That completes the “right way” to understand every Ax = b. 
Stay with it—you are doing real mathematics. 
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The Four Subspaces for R 


Suppose A is reduced to its row echelon form R. For that special form, the four subspaces 
are easy to identify. We will find a basis for each subspace and check its dimension. Then 
we watch how the subspaces change (two of them don’t change!) as we look back at A. 
The main point is that the four dimensions are the same for A and R. 

As a specific 3 by 5 example, look at the four subspaces for this echelon matrix R: 


m=3 1 3 5 0 7 pivot rows 1 and 2 
=o R= 0 0 1 2 
r=2 0 0° 0 0 pivot columns 1 and 4 


The rank of this matrix is r = 2 (two pivots). Take the four subspaces in order. 
1. The row space of R has dimension 2, matching the rank. 


Reason: The first two rows are a basis. The row space contains combinations of all three 
rows, but the third row (the zero row) adds nothing new. So rows 1 and 2 span the row 
space C (RT). 

The pivot rows 1 and 2 are independent. That is obvious for this example, and it is 
always true. If we look only at the pivot columns, we see the r by r identity matrix. 
There is no way to combine its rows to give the zero row (except by the combination with 
all coefficients zero). So the r pivot rows are a basis for the row space. 


The dimension of the row space is the rank r. The nonzero rows of R form a basis. 


2. The column space of R also has dimension r = 2. 


Reason: The pivot columns 1 and 4 form a basis for C'( R). They are independent because 
they start with the r by r identity matrix. No combination of those pivot columns can give 
the zero column (except the combination with all coefficients zero). And they also span the 
column space. Every other (free) column is a combination of the pivot columns. Actually 
the combinations we need are the three special solutions ! 


Column 2 is 3 (column 1). The special solution is (—3, 1, 0, 0, 0). 
Column 3 is 5 (column 1). The special solution is (—5, 0, 1,0,0, ). 
Column 5 is 7 (column 1) + 2 (column 4). That solutionis (—7, 0,0, —2, 1). 


The pivot columns are independent, and they span, so they are a basis for C (R). 


The dimension of the column space is the rank r. The pivot columns form a basis. 
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3. The nullspace of R has dimension n — r = 5 — 2. There are n — r = 3 free variables. 
Here £2, £3, %5 are free (no pivots in those columns). They yield the three special 
solutions to Ra = O. Set a free variable to 1, and solve for x; and 74. 


X i Ra = 0 has the 
complete solution 
§2 = 0 §3 = 1 s5 = 0 
T = T282 + 3S3 + L585 
0 0 —2 
0 0 1 The nullspace has dimension 3. 


Reason: There is a special solution for each free variable. With n variables and r pivots, 
that leaves n — r free variables and special solutions. The special solutions are independent, 
because they contain the identity matrix in rows 2, 3, 5. So N(R) has dimension n — r. 


The nullspace has dimension n — r. The special solutions form a basis. 


4. The nullspace of RT (left nullspace of R) has dimension m — r = 3 — 2. 


Reason: The equation RTy = 0 looks for combinations of the columns of RT (the rows 
of R) that produce zero. This equation RTy = 0 or yT R = O° is 


Left nullspace yill Be 50,7] 
Combination +y2[0, 0, 0, 1, 2] (1) 
of rows is zero +y3[0, 0, 0, 0, 0] 

16; 0, 0; 0, 0] 


2 7 r 


The solutions y1, y2, y3 are pretty clear. We need yı = 0 and yo = 0. The variable y3 is 
free (it can be anything). The nullspace of RT contains all vectors y = (0, 0, y3). 

In all cases R ends with m — r zero rows. Every combination of these m — r rows 
gives zero. These are the only combinations of the rows of R that give zero, because the 
pivot rows are linearly independent. So y in the left nullspace has y; = 0,..., Yr = 0. 


If Ais m by n of rank r, its left nullspace has dimension m — r. 


Why is this a “left nullspace”? The reason is that RTy = O can be transposed to 
y'R = 0T. Now y" is a row vector to the left of R. You see the y’s in equation (1) 
multiplying the rows. This subspace came fourth, and some linear algebra books omit 
it—but that misses the beauty of the whole subject. 


In R” the row space and nullspace have dimensions r and n — r (adding to n). 
In R” the column space and left nullspace have dimensions r and m — r (total m). 
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The Four Subspaces for A 


We have a job still to do. The subspace dimensions for A are the same as for R. 
The job is to explain why. A is now any matrix that reduces to R = rref(A). 


l ə 5 0 7 
This A reduces to R A=/!0 0 U0 1 2 Notice C(A) 4 C(R)! (2) 
13 5 1 9 


C(A) 
dim r 


C(A) 


column space 
all Ax 


row space 
all ATy 


The big picture 


left nullspace 
ATy=0 


nullspace 
Az =0 


N (AT) 
dimension m — r 


N(A) 
dimension n — r 


Figure 3.5: The dimensions of the Four Fundamental Subspaces (for R and for A). 


1 A has the same row space as R. Same dimension r and same basis. 


Reason: Every row of A is a combination of the rows of R. Also every row of R is a 
combination of the rows of A. Elimination changes rows, but not row spaces. 

Since A has the same row space as R, we can choose the first r rows of R as a basis. 
Or we could choose r suitable rows of the original A. They might not always be the first r 
rows of A, because those could be dependent. The good r rows of A are the ones that end 
up as pivot rows in R. 


2 The column space of A has dimension r. The column rank equals the row rank. 


Rank Theorem: The number of independent columns=the number of independent rows. 


Wrong reason: “A and R have the same column space.” This is false. The columns of R 
often end in zeros. The columns of A don’t often end in zeros. Then C(A) is not C(R). 
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Right reason: The same combinations of the columns are zero (or nonzero) for A and R. 
Dependent in A + dependent in R. Say that another way: Ax = 0 exactly when Rx = 0. 
The column spaces are different, but their dimensions are the same—equal to r. 


Conclusion The r pivot columns of A are a basis for its column space C(A). 


3 A has the same nullspace as R. Same dimension n — r and same basis. 


Reason: The elimination steps don’t change the solutions. The special solutions are a ba- 
sis for this nullspace (as we always knew). There are n — r free variables, so the dimension 
of the nullspace is n — r. This is the Counting Theorem: r + (n — r) equals n. 


(dimension of column space) + (dimension of nullspace) = dimension of R”. 


4 The left nullspace of A (the nullspace of A‘) has dimension m — r. 


Reason: AT is just as good a matrix as A. When we know the dimensions for every A, 
we also know them for AT. Its column space was proved to have dimension r. Since AT 
is n by m, the “whole space” is now R™. The counting rule for A was r + (n = r) =n. 
The counting rule for AT is r + (m — r) = m. We now have all details of a big theorem: 


Fundamental Theorem of Linear Algebra, Part 1 


The column space and row space both have dimension r. 


The nullspaces have dimensions n — r and m — r. 


By concentrating on spaces of vectors, not on individual numbers or vectors, we get these 
clean rules. You will soon take them for granted—eventually they begin to look obvious. 
But if you write down an 11 by 17 matrix with 187 nonzero entries, I don’t think most 
people would see why these facts are true: 


dimension of C(A) = dimension of C (AT) = rank of A 


Two key facts dimension of C(A) + dimension of N(A) = 17. 


Example1 A={[1 2 3] has m=1 and n=3 andrank r=1. 


The row space is a line in R°. The nullspace is the plane Ax = x; + 229 + 323 = 0. This 
plane has dimension 2 (which is 3 — 1). The dimensions add to 1 + 2 = 3. 

The columns of this 1 by 3 matrix are in R+! The column space is all of Rt. The left 
nullspace contains only the zero vector. The only solution to AT y = 0 is y = 0, no other 
multiple of [1 2 3] gives the zero row. Thus N (AT) is Z, the zero space with dimension 
0 (which is m — r). In R™ the dimensions of C(A) and N(AT) addto1+0=1. 
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1 2 3 


Example 2 a=; 4 6 


| has m = 2 with n = 3 and rank r = L. 


The row space is the same line through (1,2,3). The nullspace must be the same plane 
zı + 2x2 + 3x3 = 0. The line and plane dimensions still add to 1 + 2 = 3. 

All columns are multiples of the first column (1,2). Twice the first row minus the 
second row is the zero row. Therefore ATy = 0 has the solution y = (2, —1). The column 
space and left nullspace are perpendicular lines in R?. Dimensions 1 + 1 = 2. 


Column space = line through H Left nullspace = line through | 


If A has three equal rows, its rank is . What are two of the y’s in its left nullspace? 
The y’s in the left nullspace combine the rows to give the zero row. 


Example 3 You have nearly finished three chapters with made-up equations, and this 
can’t continue forever. Here is a better example of five equations (one for every edge in 
Figure 3.6). The five equations have four unknowns (one for every node). The matrix in 
Ag = bis an incidence matrix. This matrix A has 1 and —1 on every row. 


=z] +29 = bı 

Differences Ax = b —£1 +23 = by 
across edges 1, 2,3, 4,5 —%2 +23 = ba (3) 

between nodes 1, 2, 3, 4 —29 +z4 = bg 

=—%3 +L = bs 


If you understand the four fundamental subspaces for this matrix (the column spaces and 
the nullspaces for A and AT) you have captured the central ideas of linear algebra. 


Tı edges 
—1 1 1 
—1 I 2 
A= —1 1 3 
—1 1 4 
—1 1 5 


T4 


Figure 3.6: A “graph” with 5 edges and 4 nodes. A is its 5 by 4 incidence matrix. 


The nullspace N (A) To find the nullspace we set b = 0. Then the first equation 
says zı = Lg. The second equation is 73 = xı. Equation 4 is z2 = x4. All four unknowns 
£1, £2, Z3, £4 have the same value c. The vectors x = (c, c, c, c) fill the nullspace of A. 


That nullspace is a line in Rf. The special solution x = (1,1,1,1) is a basis for 
N(A). The dimension of N (A) is 1 (one vector in the basis). The rank of A must be 3, 
since n — r =4—3= 1. We now know the dimensions of all four subspaces. 
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The column space C(A) There must be r = 3 independent columns. The fast way 
is to look at the first 3 columns. The systematic way is to find R = rref(A). 


—1 1 0 1 0 0 -l 

Columns —] 0 1 0 1 0 —1 
1,2,3 oei A Ra TOW Pg ao a] 
A 0 1 0 echelon form 00 0 0 

0 0 -l 0 0 0 0 


From R we see again the special solution = (1,1,1,1). The first 3 columns are basic, 
the fourth column is free. To produce a basis for C(A) and not C(R), we go back to 
columns 1, 2,3 of A. The column space has dimension r = 3. 


The row space C (AT) The dimension must again be r = 3. But the first 3 rows of 
A are not independent: row 3 = row 2 — row 1. So row 3 became zero in elimination, 
and row 3 was exchanged with row 4. The first three independent rows are rows 1, 2,4. 
Those three rows are a basis (one possible basis) for the row space. 


I notice that edges 1,2,3 form a loop in the picture: Dependent rows 1, 2,3. 
Edges 1, 2,4 form a tree in the picture. Trees have no loops! Independent rows 1, 2, 4. 


The left nullspace N(AT) Now we solve ATy = 0. Combinations of the rows 
give zero. We already noticed that row 3 = row 2 — row 1, so one solution is y = 
(1,—1,1,0,0). I would say: That y comes from following the upper loop in the picture. 
Another y comes from going around the lower loop and it is y = (0,0,—1,1,—1): 
row 3 = row 4 — row 5. Those two y’s are independent, they solve ATy = 0, and the 
dimension of N (AT) is m — r = 5 — 3 = 2. So we have a basis for the left nullspace. 


You may ask how “loops” and “trees” got into this problem. That didn’t have to happen. 
We could have used elimination to solve AT y = 0. The 4 by 5 matrix A? would have three 
pivot columns 1,2, 4 and two free columns 3,5. There are two special solutions and the 
nullspace of AT has dimension two: m — r = 5 — 3 = 2. But loops and trees identify 
dependent rows and independent rows in a beautiful way. We use them in Section 10.1 for 
every incidence matrix like this A. 


The equations Ax = b give “voltages” x1, £2, £3, £4 at the four nodes. The equations 
ATy = 0 give “currents” 41, Y2, Y3, Y4, Ys on the five edges. These two equations are 
Kirchhoff’s Voltage Law and Kirchhoff’s Current Law. Those words apply to an elec- 
trical network. But the ideas behind the words apply all over engineering and science and 
economics and business. 


Graphs are the most important model in discrete applied mathematics. You see graphs 
everywhere: roads, pipelines, blood flow, the brain, the Web, the economy of a country or 
the world. We can understand their matrices A and AT. 
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Rank One Matrices (Review) 


Suppose every row is a multiple of the first row. Here is a typical example: 


ITITS 1 
24 3a Ta Baspa 3% 8a 
2b 3b 7b 8b b 


On the left is a matrix with three rows. But its row space only has dimension = 1. 
The row vector v? = [2 o 7 8 | tells us a basis for that row space. The row rank is 1. 

Now look at the columns. “The column rank equals the row rank which is 1.” 
All columns of the matrix must be multiples of one column. Do you see that this key 
rule of linear algebra is true? The column vector u = (1, a,b) is multiplied by 2, 3, 7,8. 
That nonzero vector u is a basis for the column space. The column rank is also 1. 


Every rank one matrix is one column times one row A = uvt 


Rank Two Matrices = Rank One plus Rank One 


Here is a matrix A of rank r = 2. We can’t see r immediately from A. So we reduce 
the matrix by row operations to R = rref (A). Some elimination matrix E simplifies A to 
EA = R. Then the inverse matrix C = E~' connects R back to A = CR. 

You know the main point already: R has the same row space as A. 


Rank 1 0 3 1 0 0 1 0 3 
ies A=|;1 1 7 |SBi1t1 0 0 1 4 |=CR. (4) 
4 2 20 4 2 1 0 0 0 
The row space of R clearly has two basis vectors vf = [1 0 3] and v? = [0 1 4]. 


So the (same!) row space of A also has this basis: row rank = 2. Multiplying C times R 
says that row 3 of A is 4v? + 2v7. 

Now look at columns. The pivot columns of R are clearly (1,0,0) and (0, 1,0). 
Then the pivot columns of A are also in columns 1 and 2: w; = (1, 1,4) and u2 = (0, 1, 2). 
Notice that C has those same first two columns! That was guaranteed since multiplying 
by two columns of the identity matrix (in R) won’t change the pivot columns u; and u2. 


When you put in letters for the columns and rows, you see rank 2 = rank 1 + rank 1. 


T 
vi 
Matrix A 
Rank two A=] u, U2 Us ve = uiv? t+ugva = (rank 1)+(rank 1). 
zero row 


Did you see that last step? I multiplied the matrices using columns times rows. 

That was perfect for this problem. Every rank r matrix is a sum of r rank one matrices: 

Pivot columns of A times nonzero rows of R. The row [0 0 0] simply disappeared. 
The pivot columns u; and wz are a basis for the column space, which you knew. 
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= REVIEW OF THE KEY IDEAS = 


1. The r pivot rows of R are a basis for the row spaces of R and A (same space). 
2. The r pivot columns of A (!) are a basis for its column space C(A). 
3. The n — r special solutions are a basis for the nullspaces of A and R (same space). 


4. If EA = R, the last m — r rows of E are a basis for the left nullspace of A. 


Note about the four subspaces The Fundamental Theorem looks like pure algebra, but 
it has very important applications. My favorites are the networks in Chapter 10 (often 
I go to 10.1 for my next lecture). The equation for y in the left nullspace is AT y = 0: 


Flow into a node equals flow out. Kirchhoff’s Current Law is the “balance equation”. 


This must be the most important equation in applied mathematics. All models in science 
and engineering and economics involve a balance—of force or heat flow or charge or mo- 
mentum or money. That balance equation, plus Hooke’s Law or Ohm’s Law or some 
law connecting “potentials” to “flows”, gives a clear framework for applied mathematics. 

My textbook on Computational Science and Engineering develops that framework, 
together with algorithms to solve the equations: Finite differences, finite elements, 
spectral methods, iterative methods, and multigrid. 


= WORKED EXAMPLES = 


3.5 A Put four 1’s into a 5 by 6 matrix of zeros, keeping the dimension of its row space 
as small as possible. Describe all the ways to make the dimension of its column space as 
small as possible. Describe all the ways to make the dimension of its nullspace as small as 
possible. How to make the sum of the dimensions of all four subspaces small? 


Solution The rank is 1 if the four 1’s go into the same row, or into the same column. 
They can also go into two rows and two columns (so Qii = Qij = Qji = aj; = 1). 
Since the column space and row space always have the same dimensions, this answers the 
first two questions: Dimension 1. 

The nullspace has its smallest possible dimension 6 — 4 = 2 when the rank is r = 4. 
To achieve rank 4, the 1’s must go into four different rows and four different columns. 

You can’t do anything about the sum r + (n—r)+r+(m—r) = n + m. It will be 
6 + 5 = 11 no matter how the 1’s are placed. The sum is 11 even if there aren’t any 1’s... 


If all the other entries of A are 2’s instead of 0’s, how do these answers change? 
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3.5B Fact: All the rows of AB are combinations of the rows of B. So the row space of 
AB is contained in (possibly equal to) the row space of B. Rank (AB) < rank (B). 
All columns of AB are combinations of the columns of A. So the column space of 
AB is contained in (possibly equal to) the column space of A. Rank (AB) < rank (A). 
If we multiply by an invertible matrix, the rank will not change. The rank can’t drop, 
because when we multiply by the inverse matrix the rank can’t jump back. 


Problem Set 3.5 


1 (a) Ifa 7 by 9 matrix has rank 5, what are the dimensions of the four subspaces? 
What is the sum of all four dimensions? 


(b) If a3 by 4 matrix has rank 3, what are its column space and left nullspace? 


2 Find bases and dimensions for the four subspaces associated with A and B: 
1 2 4 lL 2 4 
Dir ae Bale al 
3 Find a basis for each of the four subspaces associated with A: 
0 1 2-3 4 1 O O10 1 2 3.4 
As |) L 274 oe) =) eh Oe de 
oo & 1 2 OT 1/10 0. 0-070 
4 Construct a matrix with the required property or explain why this is impossible: 


LTO : 
(a) Column space contains l | ; [8 | , row space contains EA lel. 


(b) Column space has basis H , nullspace has basis H : 


(c) Dimension of nullspace = 1 + dimension of left nullspace. 
(d) Nullspace contains iat column space contains Eve 


(e) Row space = column space, nullspace Æ left nullspace. 


5 If V is the subspace spanned by ( 14,1) and ( 2,0), find a matrix A that has 
V as its row space. Find a matrix B that has V as its nullspace. Multiply AB. 


6 Without using elimination, find dimensions and bases for the four subspaces for 
03 3 3 1 
A=|0 0 0 0 and B= |4 
es | aan 5 


7 Suppose the 3 by 3 matrix A is invertible. Write down bases for the four subspaces 
for A, and also for the 3 by 6 matrix B = [A A]. (The basis for Z is empty.) 
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8 


10 


11 


12 


13 


14 


15 


16 
17 


What are the dimensions of the four subspaces for A, B, and C, if I is the 3 by 3 
identity matrix and 0 is the 3 by 2 zero matrix? 


I I 
A= 0] and B= |r i] and CaO. 


Which subspaces are the same fOr these matrices of different sizes? 


(a) (Aland H (b) H and o I 


Prove that all three of those matrices have the same rank r. 


If the entries of a 3 by 3 matrix are chosen randomly between 0 and 1, what are the 
most likely dimensions of the four subspaces? What if the random matrix is 3 by 5? 


(Important) A is an m by n matrix of rank r. Suppose there are right sides b for 
which Ax = b has no solution. 


(a) What are all inequalities (< or <) that must be true between m, n, and r? 


(b) How do you know that ATy = 0 has solutions other than y = 0? 


Construct a matrix with (1,0,1) and (1, 2,0) as a basis for its row space and its 
column space. Why can’t this be a basis for the row space and nullspace? 


True or false (with a reason or a counterexample): 


(a) If m = n then the row space of A equals the column space. 
(b) The matrices A and —A share the same four subspaces. 


(c) If Aand B share the same four subspaces then A is a multiple of B. 


Without computing A, find bases for its four fundamental subspaces: 
1 0 OJ {1 2 3 4 
A=]|6 1 0} |0 1 2 3 
9 & 1|{0 0 1 2 


If you exchange the first two rows of A, which of the four subspaces stay the same? 
Ifv = (1, 2,3, 4) is in the left nullspace of A, write down a vector in the left nullspace 
of the new matrix after the row exchange. 


Explain why v = (1,0, —1) cannot be a row of A and also in the nullspace. 


Describe the four subspaces of RÌ associated with 
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18 


19 


20 


21 


22 


23 


24 


Chapter 3. Vector Spaces and Subspaces 


(Left nullspace) Add the extra column b and reduce A to echelon form: 


123 bh 1 2 3 b 
[A b]=|4 5 6 b| — |0 -3 -6 bo - 4b; 
7 8 9 bg 0 0 0 b3 — 2bo + bı 


A combination of the rows of A has produced the zero row. What combination is it? 
(Look at b3 — 2b2 + bı on the right side.) Which vectors are in the nullspace of AT 
and which vectors are in the nullspace of A? 


Following the method of Problem 18, reduce A to echelon form and look at zero 
rows. The b column tells which combinations you have taken of the rows: 


12a) a 
(a) |3 4 b (b) : 
EE w. 
3 2 5 bs 


From the b column after elimination, read off m—r basis vectors in the left nullspace. 
Those y’s are combinations of rows that give zero rows in the echelon form. 


(a) Check that the solutions to Ax = 0 are perpendicular to the rows of A: 
1 0 0|]4 2 0 1 
A=]|2 1 0|J]O 0O 1 3| =ER. 
3 4 1] 40 0 0 O 


(b) How many independent solutions to AT y = 0? Why does yT = row 3 of E71? 


Suppose A is the sum of two matrices of rank one: A = wv! + wz". 


(a) Which vectors span the column space of A? 

(b) Which vectors span the row space of A? 

(c) Therankislessthan2if ss orif __— 

(d) Compute A and its rank if u = z = ( 100,0)andv=w=( @,1). 


Construct A = uv? + wz? whose column space has basis ( 2,4), ( 2,1) and 
whose row space has basis ( 10), ( 11). Write A as (3 by 2) times (2 by 2). 


Without multiplying matrices, find bases for the row and column spaces of A: 


1 2 
fe elle al 
2 7 


How do you know from these shapes that A cannot be invertible? 


(Important) AT y = d is solvable when d is in which of the four subspaces? The 
solution y is unique when the contains only the zero vector. 


ga 
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25 


26 


27 


28 


29 


30 


31 


True or false (with a reason or a counterexample): 


(a) A and AT have the same number of pivots. 

(b) A and AT have the same left nullspace. 

(c) If the row space equals the column space then AT = A. 

(d) If AT = —A then the row space of A equals the column space. 


If a,b, c are given with a Æ 0, how would you choose d so that E A has rank 1? 


Find a basis for the row space and nullspace. Show they are perpendicular! 


Find the ranks of the 8 by 8 checkerboard matrix B and the chess matrix C: 


1 0101010 rnbqakbnr 

0 1 0 1 0 1 0 1 pppppppp 
B=]|1 0 101010 and C= four zero rows 

Be, wee at gee, AE MEA we PPP PP PP ?~Pp 

0 101010 1 rnbqakbnr 


The numbers r, n,b,q, k, p are all different. Find bases for the row space and left 
nullspace of B and C’. Challenge problem: Find a basis for the nullspace of C. 


Can tic-tac-toe be completed (5 ones and 4 zeros in A) so that rank (A) = 2 but 
neither side passed up a winning move? 


Challenge Problems 


If A = uv? isa 2 by 2 matrix of rank 1, redraw Figure 3.5 to show clearly the Four 
Fundamental Subspaces. If B produces those same four subspaces, what is the exact 
relation of B to A? 


M is the space of 3 by 3 matrices. Multiply every matrix X in M by 


1 0 -=li 1 0 
A=|-1l 1 Of. Notice: AJ1!] = 1/0 
0-1 1 1 0 


(a) Which matrices X lead to AX = zero matrix? 
(b) Which matrices have the form AX for some matrix X? 


(a) finds the “nullspace” of that operation AX and (b) finds the “column space”. 
What are the dimensions of those two subspaces of M? Why do the dimensions add 
to(n—r)+r=9? 


Suppose the m by n matrices A and B have the same four subspaces. If they are 
both in row reduced echelon form, prove that F must equal G: 


[55] [5 3] 


Chapter 4 


Orthogonality 


4.1 Orthogonality of the Four Subspaces 


1 Orthogonal vectors have v™w = 0. Then ||v||? + ||w||? = |v + w||? = |v — w]|?. 


2 Subspaces V and W are orthogonal when vTw = 0 for every v in V and every w in W. 


3 The row space of A is orthogonal to the nullspace. The column space is orthogonal to N (AT). 


4 One pair of dimensions adds to r + (n — r) = n. The other pair has r + (m — r) = m. 
5 Row space and nullspace are orthogonal complements: Every x in R” splits into pow + £null- 
6 Suppose a space S has dimension d. Then every basis for S consists of d vectors. 


7 If d vectors in S are independent, they span S. If d vectors span S, they are independent. 


Two vectors are orthogonal when their dot product is zero: v - w = vw = 0. This 
chapter moves to orthogonal subspaces and orthogonal bases and orthogonal matrices. 
The vectors in two subspaces, and the vectors in a basis, and the column vectors in Q, 
all pairs will be orthogonal. Think of a? + b? = c? for a right triangle with sides v and w. 


Orthogonal vectors v w=0 and lvl? + llwll? = |Jo + wll’. 


The right side is (v + w)T (v + w). This equals vv + wTw when vi w = wv = 0. 


Subspaces entered Chapter 3 to throw light on Ax = b. Right away we needed the 
column space and the nullspace. Then the light turned onto AT, uncovering two more 
subspaces. Those four fundamental subspaces reveal what a matrix really does. 

A matrix multiplies a vector: A times x. At the first level this is only numbers. At 
the second level Ax is a combination of column vectors. The third level shows subspaces. 
But I don’t think you have seen the whole picture until you study Figure 4.2. 
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The subspaces fit together to show the hidden reality of A times x. The 90° angles 
between subspaces are new—and we can say now what those right angles mean. 


The row space is perpendicular to the nullspace. Every row of A is perpendicular to 
every solution of Ax = 0. That gives the 90° angle on the left side of the figure. This 
perpendicularity of subspaces is Part 2 of the Fundamental Theorem of Linear Algebra. 

The column space is perpendicular to the nullspace of AT. When b is outside the 
column space—when we want to solve Ax = b and can’t do it—then this nullspace of 
AT comes into its own. It contains the error e = b — Az in the “least-squares” solution. 
Least squares is the key application of linear algebra in this chapter. 


Part 1 of the Fundamental Theorem gave the dimensions of the subspaces. The row 
and column spaces have the same dimension r (they are drawn the same size). The two 
nullspaces have the remaining dimensions n — r and m — r. Now we will show that 
the row space and nullspace are orthogonal subspaces inside R”. 


DEFINITION Two subspaces V and W of a vector space are orthogonal if every vector 
v in V is perpendicular to every vector w in W: 


Orthogonal subspaces vw = 0 forall v in V and all w in W. 


Example 1 The floor of your room (extended to infinity) is a subspace V. The line where 
two walls meet is a subspace W (one-dimensional). Those subspaces are orthogonal. 
Every vector up the meeting line of the walls is perpendicular to every vector in the floor. 


Example 2 Two walls look perpendicular but those two subspaces are not orthogonal! 
The meeting line is in both V and W—and this line is not perpendicular to itself. Two 
planes (dimensions 2 and 2 in R°) cannot be orthogonal subspaces. 

When a vector is in two orthogonal subspaces, it must be zero. It is perpendicular to 
itself. It is v and it is w, so v' v = 0. This has to be the zero vector. 


| <> 

| 
4 v'w 40 
orthogonal plane V and line W non-orthogonal planes 


Figure 4.1: Orthogonality is impossible when dim V + dim W > dim (whole space). 


The crucial examples for linear algebra come from the four fundamental subspaces. 
Zero is the only point where the nullspace meets the row space. More than that, the 
nullspace and row space of A meet at 90°. This key fact comes directly from Ax = 0: 
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Every vector x in the nullspace is perpendicular to every row of A, because Ax = 0. 
The nullspace N(A) and the row space C(A") are orthogonal subspaces of R”. 


To see why æ is perpendicular to the rows, look at Ax = 0. Each row multiplies æ: 


row 1 0 +— (row 1)- zis zero 


row m 0 +— (row™m)- 2 is zero 


The first equation says that row 1 is perpendicular to x. The last equation says that row m is 
perpendicular to x. Every row has a zero dot product with x. Then z is also perpendicular 
to every combination of the rows. The whole row space C (AT) is orthogonal to N(A). 

Here is a second proof of that orthogonality for readers who like matrix shorthand. 
The vectors in the row space are combinations A‘ y of the rows. Take the dot product 
of Aly with any z in the nullspace. These vectors are perpendicular: 


Nullspace orthogonal to row space x1 (A’y) = (Ax)'y = 07y = 0. (2) 


We like the first proof. You can see those rows of A multiplying æ to produce zeros in equa- 
tion (1). The second proof shows why A and AT are both in the Fundamental Theorem. 


Example 3 The rows of A are perpendicular to x = (1, 1, —1) in the nullspace: 


1+3-4=0 
5+2—7=0 


1 3 4 


Ax =| 27 


| 
| 1 = fo gives the dot products 
=] 


Now we turn to the other two subspaces. In this example, the column space is all of R?. 
The nullspace of AT is only the zero vector (orthogonal to every vector). The column space 
of A and the nullspace of AT are always orthogonal subspaces. 


Every vector y in the nullspace of AT is perpendicular to every column of A. 
The left nullspace N(A‘) and the column space C(A) are orthogonal in R™. 


Apply the original proof to AT. The nullspace of AT is orthogonal to the row space of 
At —and the row space of AT is the column space of A. Q.E.D. 
For a visual proof, look at AT y = 0. Each column of A multiplies y to give 0: 


(column 1)? 0 
C(A) L N(A?*) ATy = a Gea Vela (3) 
(column n) 0 


The dot product of y with every column of A is zero. Then y in the left nullspace is 
perpendicular to each column of A—and to the whole column space. 
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dimension 
Zr 


dimension 
=r 


column 
space 


pace 
of A 


S 
row space to column 


AXrow =b 


AXnull =0 


nullspace to 9 


nullspace 
of AT 


wee 


nullspace 
of A 


dimension 
=m-r 


dimension 
=n-r 


Figure 4.2: Two pairs of orthogonal subspaces. The dimensions add to n and add to m. 
This is the Big Picture—two subspaces in R” and two subspaces in R”. 


Orthogonal Complements 


Important The fundamental subspaces are more than just orthogonal (in pairs). 
Their dimensions are also right. Two lines could be perpendicular in R3, but those lines 
could not be the row space and nullspace of a 3 by 3 matrix. The lines have dimensions 
l and 1, adding to 2. But the correct dimensions r and n — r must add to n = 3. 

The fundamental subspaces of a 3 by 3 matrix have dimensions 2 and 1, or 3 and 0. 
Those pairs of subspaces are not only orthogonal, they are orthogonal complements. 


DEFINITION The orthogonal complement of a subspace V contains every vector that is 
perpendicular to V. This orthogonal subspace is denoted by Vo (pronounced “V perp”). 


By this definition, the nullspace is the orthogonal complement of the row space. 
Every x that is perpendicular to the rows satisfies A% = 0, and lies in the nullspace. 


The reverse is also true. If v is orthogonal to the nullspace, it must be in the row 
space. Otherwise we could add this v as an extra row of the matrix, without changing its 
nullspace. The row space would grow, which breaks the law r + (n — r) = n. We conclude 
that the nullspace complement N(A)+ is exactly the row space C (AT). 

In the same way, the left nullspace and column space are orthogonal in R”, and they 
are orthogonal complements. Their dimensions r and m — r add to the full dimension m. 
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Fundamental Theorem of Linear Algebra, Part 2 


N(A) is the orthogonal complement of the row space C (AT) (in R”). 
N(A‘) is the orthogonal complement of the column space C(A) (in R”). 


Part 1 gave the dimensions of the subspaces. Part 2 gives the 90° angles between them. 
The point of “complements” is that every æ can be split into a row space component £r 
and a nullspace component £n. When A multiplies x = £r + £n, Figure 4.3 shows what 
happens to Ax = Az, + Az: 


The nullspace component goes to zero: Ax,, = 0. 


The row space component goes to the column space: Axr = Az. 


Every vector goes to the column space! Multiplying by A cannot do anything else. 
More than that: Every vector b in the column space comes from one and only one vector 
x, in the row space. Proof: If Ax, = Ax., the difference x, — æ}. is in the nullspace. 
It is also in the row space, where a, and æ/. came from. This difference must be the zero 
vector, because the nullspace and row space are perpendicular. Therefore x, = £4. 

There is an r by r invertible matrix hiding inside A, if we throw away the two nullspaces. 
From the row space to the column space, A is invertible. The “pseudoinverse” will invert 
that part of A in Section 7.4. 


Example 4 Every matrix of rank r has an r by r invertible submatrix: 


30000 3 0 
A= |0 5 0 0-0 contains the submatrix F l f 
0 000ū00 


The other eleven zeros are responsible for the nullspaces. The rank of B is also r = 2: 
123 4 5 L 3 
B=| 1 245 6 contains | E | in the pivot rows and columns. 
1245 6 
Every matrix can be diagonalized, when we choose the right bases for R” and R™. This 


Singular Value Decomposition has become extremely important in applications. 


Let me repeat one clear fact. A row of A can’t be in the nullspace of A (except for 
a zero row). The only vector in two orthogonal subspaces is the zero vector. 


If a vector v is orthogonal to itself then v is the zero vector. 
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dim r 


all combinations 
b of the: 
columns 


all combinations 
of the rows 


all vectors 
orthogonal to 
the columns 


all vectors 
orthogonal 
to the rows 


dim m — r 


dim n — r 


Figure 4.3: This update of Figure 4.2 shows the true action of A on x = £r + £n. 
Row space vector £r to column space, nullspace vector £n to zero. 


Drawing the Big Picture 


I don’t know the best way to draw the four subspaces in Figures 4.2 and 4.3. This big 
picture has to show the orthogonality of those subspaces. I can see a possible way to do 
it when a line meets a plane—maybe Figure 4.4 also shows that those spaces are infinite, 
more clearly than the rectangles in Figure 4.3. But how do I draw a pair of two-dimensional 
subspaces in R4, to show they are orthogonal to each other? Good ideas are welcome. 


nullspace of A 


1 E 
A= 1 0 1 
t lal 


direction (1,0, —1) 
orthogonal to rows 


Figure 4.4: Row space of A = plane. Nullspace = orthogonal line. Dimensions 2 + 1 = 3. 
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Combining Bases from Subspaces 


What follows are some valuable facts about bases. They were saved until now—when we 
are ready to use them. After a week you have a clearer sense of what a basis is (linearly 
independent vectors that span the space). Normally we have to check both properties. 
When the count is right, one property implies the other: 


Any n independent vectors in R” must span R”. So they are a basis. 
Any n vectors that span R” must be independent. So they are a basis. 


Starting with the correct number of vectors, one property of a basis produces the other. 
This is true in any vector space, but we care most about R”. When the vectors go into the 
columns of an n by n square matrix A, here are the same two facts: 


If the n columns of A are independent, they span R”. So Aw = bis solvable. 
If the n columns span R”, they are independent. So Ax = b has only one solution. 


Uniqueness implies existence and existence implies uniqueness. Then A is invertible. If 
there are no free variables, the solution æ is unique. There must be n pivot columns. 
Then back substitution solves Ax = b (the solution exists). 


Starting in the opposite direction, suppose that Az = b can be solved for every b 
(existence of solutions). Then elimination produced no zero rows. There are n pivots and 
no free variables. The nullspace contains only x = O (uniqueness of solutions). 


With bases for the row space and the nullspace, we have r + (n — r) = n vectors. 
This is the right number. Those n vectors are independent.” Therefore they span R”. 


Each z is the sum £r + £n of a row space vector x, and a nullspace vector £n. 


The splitting in Figure 4.3 shows the key point of orthogonal complements—the dimen- 
sions add to n and all vectors are fully accounted for. 


1S2 . A |. 2 2 
Example 5 ra=]; g | sot =| 5 ]imowr+en=[ + 7]. 


The vector (2, 4) is in the row space. The orthogonal vector (2, —1) is in the nullspace. 
The next section will compute this splitting for any A and a, by a projection. 


*If a combination of all n vectors gives er + £n = 0, then zr = —a@rp, is in both subspaces. 
So £r = £n = 0. All coefficients of the row space basis and of the nullspace basis must be zero. 
This proves independence of the n vectors together. 
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= REVIEW OF THE KEY IDEAS =" 


1. Subspaces V and W are orthogonal if every v in V is orthogonal to every w in W. 


2. V and W are “orthogonal complements” if W contains all vectors perpendicular to 
V (and vice versa). Inside R”, the dimensions of complements V and W add to n. 


3. The nullspace N (A) and the row space C (AT) are orthogonal complements, with 
dimensions (n — r) +r = n. Similarly N(A™) and C(A) are orthogonal comple- 
ments with (m — r) +r = m. 


4. Any n independent vectors in R” span R”. Any n spanning vectors are independent. 


= WORKED EXAMPLES = 


4.1A Suppose S is a six-dimensional subspace of nine-dimensional space R°. 
(a) What are the possible dimensions of subspaces orthogonal to S’? 
(b) What are the possible dimensions of the orthogonal complement S + of S? 
(c) What is the smallest possible size of a matrix A that has row space S? 


(d) What is the smallest possible size of a matrix B that has nullspace S ag 


Solution 
(a) If Sis six-dimensional in RÌ, subspaces orthogonal to S can have dimensions 0, 1, 2, 3. 
(b) The complement S + is the largest orthogonal subspace, with dimension 3. 
(c) The smallest matrix A is 6 by 9 (its six rows will be a basis for S). 
(d) This is the same as question (c) ! 


If a new row 7 of B is a combination of the six rows of A, then B has the same row 
space as A. It also has the same nullspace. The special solutions s1, S2, 53 to Ax = O. 
will be the same for Ba = 0. Elimination will change row 7 of B to all zeros. 


4.1B The equation x — 3y — 4z = 0 describes a plane P in R? (actually a subspace). 
(a) The plane P is the nullspace N (A) of what 1 by 3 matrix A? Ans: A = [1 —3 —4]. 


(b) Find a basis S1, S2 of special solutions of x — 3y — 4z = O (these would be the 
columns of the nullspace matrix N). Answer: sı = (3, 1,0) and s2 = (4,0, 1). 


(c) Find a basis for the line P+ that is perpendicular to P. Answer: (1, —3, —4)! 
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Problem Set 4.1 


Questions 1-12 grow out of Figures 4.2 and 4.3 with four subspaces. 


1 Construct any 2 by 3 matrix of rank one. Copy Figure 4.2 and put one vector in each 
subspace (and put two in the nullspace). Which vectors are orthogonal? 


2 Redraw Figure 4.3 for a 3 by 2 matrix of rank r = 2. Which subspace is Z (zero 
vector only)? The nullspace part of any vector x in R? is £n = 


3 Construct a matrix with the required property or say why that is impossible: 


1 2 
(a) Column space contains a and -| , nullspace contains H 


1 2 
(b) Row space contains | 2 and -3], nullspace contains E 


(c) Az = H has a solution and AT H = H 
(d) Every row is orthogonal to every column (A is not the zero matrix) 


(e) Columns add up to a column of zeros, rows add to a row of 1’s. 


4 If AB = 0 then the columns of B are in the of A. The rows of A are in the 
of B. With AB = 0, why can’t A and B be 3 by 3 matrices of rank 2? 


5 (a) If Ax = b has a solution and ATy = 0, is (yTæ = 0) or (yb = 0)? 
(b) If ATy = (1, 1,1) has a solution and Ax = 0, then 
6 This system of equations Ax = b has no solution (they lead to 0 = 1): 
t +2y+2z = 


2g + 2y + 32 
3x + 4y +5z = 


Il 


Find numbers y1, y2, Y3 to multiply the equations so they add to 0 = 1. You have 
found a vector y in which subspace? Its dot product yb is 1, so no solution x. 


7 Every system with no solution is like the one in Problem 6. There are numbers 
Y1,--+;Ym that multiply the m equations so they add up to 0 = 1. This is called 
Fredholm’s Alternative : 


Exactly one of these problems has a solution 
Ar=b OR ATy=0 with yTb=1. 
If b is not in the column space of A, it is not orthogonal to the nullspace of AT. 


Multiply the equations zı — z2 = 1 and z2 — x3 = 1 and zı — x3 = 1 by numbers 
Y1, Y2, Y3 Chosen so that the equations add up to 0 = 1. 
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8 In Figure 4.3, how do we know that Aa, is equal to Ax? How do we know that this 
vector is in the column space? If A = [4 +] and a = [į] what is æ»? 


9 If AT Aw = 0 then Ax = 0. Reason: Az is in the nullspace of AT and also in the 
of A and those spaces are ____. Conclusion: AT A has the same nullspace 
as A. This key fact is repeated in the next section. 


10 Suppose A is asymmetric matrix (AT = A). 


(a) Why is its column space perpendicular to its nullspace? 


(b) If Ax = O and Az = 5z, which subspaces contain these “eigenvectors” x 
and z? Symmetric matrices have perpendicular eigenvectors x! z = 0. 


11 (Recommended) Draw Figure 4.2 to show each subspace correctly for 


1 2 1 0 
a=Í; J and B=} a 


12 Find the pieces x, and £n and draw Figure 4.3 properly if 


Pel 9 
A=|0 0 and o= [5]. 
0 0 


Questions 13-23 are about orthogonal subspaces. 


13 Put bases for the subspaces V and W into the columns of matrices V and W. Ex- 
plain why the test for orthogonal subspaces can be written V'W = zero matrix. 
This matches v! w = 0 for orthogonal vectors. 


14 The floor V and the wall W are not orthogonal subspaces, because they share a 
nonzero vector (along the line where they meet). No planes V and W in Rê can be 
orthogonal! Find a vector in the column spaces of both matrices: 


1 2 5 4 
A=1}1 3 and B= |6 3 
1 2 5 1 
This will be a vector Ax and also B£. Think 3 by 4 with the matrix [A B]. 


15 Extend Problem 14 to a p-dimensional subspace V and a q-dimensional subspace 
W of R”. What inequality on p + q guarantees that V intersects W in a nonzero 
vector? These subspaces cannot be orthogonal. 


16 Prove that every y in N(AT) is perpendicular to every Ax in the column space, 
using the matrix shorthand of equation (2). Start from ATy = 0. 
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17 


18 


19 


20 


21 


22 


23 
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If S is the subspace of Rê containing only the zero vector, what is S+? If S is 
spanned by (1, 1,1), what is S+ ? If S is spanned by (1,1, 1) and (1,1, —1), what 
is a basis for S+ ? 


Suppose S only contains two vectors (1,5,1) and (2, 2,2) (not a subspace). Then 
S+ is the nullspace of the matrix A = . S~ is a subspace even if S is not. 


Suppose L is a one-dimensional subspace (a line) in R. Its orthogonal complement 
L+ isthe __ perpendicular to L. Then (L= isa perpendicular to L 
In fact (L+)+ is the same as 


Suppose V is the whole space R£. Then V+ contains only the vector . Then 
(V+)+is ___. So (V+)+ is the same as 


Suppose S is spanned by the vectors (1,2,2,3) and (1,3,3,2). Find two vectors 
that span S+. This is the same as solving Aæ = 0 for which A? 


If P is the plane of vectors in R4 satisfying zı + £2 + x3 + x4 = 0, write a basis 
for PŁ. Construct a matrix that has P as its nullspace. 


If a subspace S is contained in a subspace V, prove that S+ contains V+. 


Questions 24-30 are about perpendicular columns and rows. 


24 


25 


26 


27 


28 


29 


Suppose an n by n matrix is invertible: AA~*+ = J. Then the first column of AT! is 
orthogonal to the space spanned by which rows of A? 


Find A? A if the columns of A are unit vectors, all mutually perpendicular. 


Construct a 3 by 3 matrix A with no zero entries whose columns are mutually per- 
pendicular. Compute AT A. Why is it a diagonal matrix? 


The lines 32+ y = b; and 6x + 2y = bg are . They are the same line if 
In that case (b1, b2) is perpendicular to the vector . The nullspace of the matrix 
is the line 3z + y = . One particular vector in that nullspace is 


Why is each of these statements false? 
(a) (1,1, 1) is perpendicular to (1, 1, —2) so the planes x + y + z = 0 and z + y — 
2z = 0 are orthogonal subspaces. 
(b) The subspace spanned by (1, 1, 0,0, 0) and (0, 0, 0, 1, 1) is the orthogonal com- 
plement of the subspace spanned by (1, —1, 0, 0, 0) and (2, —2, 3, 4, —4). 
(c) Two subspaces that meet only in the zero vector are orthogonal. 
Find a matrix with v = (1, 2,3) in the row space and column space. Find another 


matrix with v in the nullspace and column space. Which pairs of subspaces can v 
not be in? 
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30 


31 


32 


33 


Challenge Problems 


Suppose A is 3 by 4 and B is 4 by 5 and AB = 0. So N(A) contains C(B). 
Prove from the dimensions of N (A) and C(B) that rank(A) + rank(B) < 4. 


The command N = null(A) will produce a basis for the nullspace of A. Then the 
command B = null( N”) will produce a basis for the of A. 


Suppose I give you four nonzero vectors r, n, c, lin R?. 


(a) What are the conditions for those to be bases for the four fundamental sub- 
spaces C(AT), N(A), C(A), N(AT) of a 2 by 2 matrix? 


(b) What is one possible matrix A? 


Suppose I give you eight vectors 71, 72,1, 22, C1, C2, l1, l2 in R4. 


(a) What are the conditions for those pairs to be bases for the four fundamental 
subspaces of a 4 by 4 matrix? 


(b) What is one possible matrix A? 
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4.2 Projections 


1 The projection of a vector b onto the line through a is the closest point p = a(atb/ ata). 
2 The error e = b — pis perpendicular to a: Right triangle b p e has ||p||? + ||e||? = ||b||?. 


3 The projection of b onto a subspace S is the closest vector p in S; b — pis orthogonal to S. 


4 ATA is invertible (and symmetric) only if A has independent columns: N(ATA) = N(A). 


5 Then the projection of b onto the column space of A is the vector p = A(AT A)T! ATb. 
6 The projection matrix onto C(A) is| P = A(ATA)~1AT.] Ithas p = Pb and P? =P = PT. 


May we start this section with two questions? (In addition to that one.) The first 
question aims to show that projections are easy to visualize. The second question is about 
“projection matrices”—symmetric matrices with P? = P. The projection of b is Pb. 


1 What are the projections of b = (2,3, 4) onto the z axis and the ry plane? 
2 What matrices P; and P> produce those projections onto a line and a plane? 


When b is projected onto a line, its projection p is the part of b along that line. 
If b is projected onto a plane, p is the part in that plane. The projection pis Pb. 


The projection matrix P multiplies b to give p. This section finds p and also P. 


The projection onto the z axis we call p,. The second projection drops straight down to 
the zy plane. The picture in your mind should be Figure 4.5. Start with b = (2,3, 4). 
The projection across gives p} = (0,0,4). The projection down gives pọ = (2,3,0). 
Those are the parts of b along the z axis and in the zy plane. 

The projection matrices P} and P> are 3 by 3. They multiply b with 3 components 
to produce p with 3 components. Projection onto a line comes from a rank one matrix. 
Projection onto a plane comes from a rank two matrix: 


Projection matrix 


0 0 0 i 0 0 
oe P= 10 0 0 Onto the zy plane: P2=|0 1 0 
Onto the z axis: 001 00 0 


P picks out the z component of every vector. P> picks out the x and y components. 
To find the projections p, and p» of b, multiply b by Pı and Py» (small p for the vector, 
capital P for the matrix that produces it): 


0 0 0 0 1 0 0j |z z£ 
pı =Pb=|O0 0 0 0 py = Pb= |0 1 0| Jy| = Jy 
0 0 1 z 0 0 O} {2 0 


Re 8 
| 
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In this case the projections pı and p2 are perpendicular. The xy plane and the z axis 
are orthogonal subspaces, like the floor of a room and the line between two walls. 


Projection p; = Pb = 


Figure 4.5: The projections p, = P,b and pa = P2b onto the z axis and the xy plane. 


More than just orthogonal, the line and plane are orthogonal complements. Their 
dimensions add to 1 + 2 = 3. Every vector b in the whole space is the sum of its parts in 
the two subspaces. The projections p, and p, are exactly those two parts of b: 


The vectors give p4 + po = D. The matrices give P, + P =I. (1) 


This is perfect. Our goal is reached—for this example. We have the same goal for any line 
and any plane and any n-dimensional subspace. The object is to find the part p in each 
subspace, and the projection matrix P that produces that part p = Pb. Every subspace 
of R”™ has its own m by m projection matrix. To compute P, we absolutely need a good 
description of the subspace that it projects onto. 

The best description of a subspace is a basis. We put the basis vectors into the columns 
of A. Now we are projecting onto the column space of A! Certainly the z axis is the 
column space of the 3 by 1 matrix A;. The xy plane is the column space of Ag. That plane 
is also the column space of As (a subspace has many bases). So py = pz and P = P3. 


0 1 0 1 2 
A = 10 and A = |O 1 and A3;=|2 3 
1 0 0 0 0 


Our problem is to project any b onto the column space of any m by n matrix. 
Start with a line (dimension n = 1). The matrix A will have only one column. Call it a. 


Projection Onto a Line 


A line goes through the origin in the direction of a = (a ,...,@m). Along that line, 
we want the point p closest to b = (b1,...,bm). The key to projection is orthogonality: 
The line from b to p is perpendicular to the vector a. This is the dotted line marked 
e = b — p for the error on the left side of Figure 4.6. We now compute p by algebra. 
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The projection p will be some multiple of a. Call it p = Za = “x hat” times a. 
Computing this number & will give the vector p. Then from the formula for p, we will 
read off the projection matrix P. These three steps will lead to all projection matrices: 
find 7z, then find the vector p, then find the matrix P. 

The dotted line b — p is the “error” e = b — Za. It is perpendicular to a—this will 
determine £. Use the fact that b — Fa is perpendicular to a when their dot product is zero: 


Projecting b onto a with error e = b — Za A ab ab Q) 
a-(b—Z%a)=0 or a-b-Za-a=0 Taa ala 


The multiplication aTb is the same as a: b. Using the transpose is better, because it 
applies also to matrices. Our formula & = a™b/a‘a gives the projection p = Za. 


p= Ax 
= A(ATA)-1ATD 
= Pb 


Figure 4.6: The projection p of b onto a line and onto S = column space of A. 


The projection of b onto the line through a is the vector p = Za = 


Special case 1: If b = a then & = 1. The projection of a onto a is itself. Pa = a. 


Special case 2: If bis perpendicular to a then ab = 0. The projection is p = 0. 


1 1 
Example 1 Projectb= | 1 | ontoa= | 2 | to find p = za in Figure 4.6. 
1 2 


Solution The number Z is the ratio of aTb = 5 to aTa = 9. So the projection is p = ša. 
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The error vector between b and p is e = b — p. Those vectors p and e will add to 
b= (1,151): 


~5,-(5 1010) ad e=-b_p-(4_1_} 
P= 9" \9’ 9° 9 eG? 9? Of 


Look at the right triangle of b, p, and e. The vector b is split into two parts—its 
component along the line is p, its perpendicular part is e. Those two sides p and e 


have length ||p|| = ||b|| cos 8 and ||e|| = ||b|| sin 8. Trigonometry matches the dot product: 
T 
a b a|| ||b|| cos 8 
p=-y_a haslength ||p|| = Hall NOES a = ||b||cosé. (3) 
a'a la| 


The dot product is a lot simpler than getting involved with cos0 and the length of b. 
The example has square roots in cos@ = 5/3v3 and ||b|| = v3. There are no square 
roots in the projection p = 5a/9. The good way to 5/9 is atb/ata. 


Now comes the projection matrix. In the formula for p, what matrix is multiplying b? 
You can see the matrix better if the number & is on the right side of a: 


Projection a. ap hes aa” 
Cat oa = a£ = a—— = Pb | when the matixis P= ——. 
matrix 2 P aTa aTa 


P is a column times a row! The column is a, the row is aT. Then divide by the number 
aTa. The projection matrix P is m by m, but its rank is one. We are projecting onto a 
one-dimensional subspace, the line through a. That line is the column space of P. 


T 
Example 2 Find the projection matrix P = Ta onto the line through a = pi 
aʻa 


Solution Multiply column a times row a” and divide by aTa = 9: 
1 1 1 1 2 2 
Projection matrix P = —— =- |2 [1 2 2| =-|2 4 4 
7 12 9)2 44 


This matrix projects any vector b onto a. Check p = Pb for b = (IL, 1) in Example 1: 


1 I 2 2 1 1 5 
p=Pb=—)|\2 4 4 1| = 9 10 which is correct. 
D5 A Al la 10 


If the vector a is doubled, the matrix P stays the same! It still projects onto the same line. 
If the matrix is squared, P? equals P. Projecting a second time doesn’t change anything, 
so P? = P. The diagonal entries of P add up to 3 ( 4444) =1. 
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The matrix I — P should be a projection too. It produces the other side e of the triangle— 
the perpendicular part of b. Note that (J — P)b equals b — p which is e in the left nullspace. 


When P projects onto one subspace, I — P projects onto the perpendicular subspace. 
Here I — P projects onto the plane perpendicular to a. 


Now we move beyond projection onto a line. Projecting onto an n-dimensional 
subspace of R”™ takes more effort. The crucial formulas will be collected in equations 
(5)-(6)-(7). Basically you need to remember those three equations. 


Projection Onto a Subspace 


Start with n vectors @1,...,@, in R™. Assume that these a’s are linearly independent. 


Problem: Find the combination p = 71a, + --: + Znan closest to a given vector b. 
We are projecting each b in R™ onto the subspace spanned by the a’s. 

With n = 1 (one vector a) this is projection onto a line. The line is the column space 
of A, which has just one column. In general the matrix A has n columns a@j,...,@n. 

The combinations in R™ are the vectors Az in the column space. We are looking for 
the particular combination p = AZ (the projection) that is closest to b. The hat over £ 
indicates the best choice Ẹ, to give the closest vector in the column space. That choice is 
¢ = a'b/a‘a when n = 1. For n > 1, the best & = (£1, ..., Zn) is to be found now. 


We compute projections onto n-dimensional subspaces in three steps as before: 
Find the vector £, find the projection p = A, find the projection matrix P. 


The key is in the geometry! The dotted line in Figure 4.6 goes from b to the nearest 
point AZ in the subspace. This error vector b — AZ is perpendicular to the subspace. 
The error b — AZ makes a right angle with all the vectors a,,...,@, in the base. 
The n right angles give the n equations for £: 


al (b— A®) =0 —at— 
: or b— Az} =|0]. (4) 
al (b— A?) =0 —a, — 


The matrix with those rows a} is AT. The n equations are exactly AT (b — AZ) = 0. 
Rewrite AT(b — A%) = 0 in its famous form A? A% = ATb. This is the equation 
for £, and the coefficient matrix is AT A. Now we can find £ and p and P, in that order. 
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The combination p = %,a; +--+: + Znan = AZ that is closest to b comes from £: 
Find @(n x1) A'(b—A®)=0O or ATAG = AD. (5) 


This symmetric matrix ATA is n by n. It is invertible if the a’s are independent. 
The solution is £ = (AT A)~! ATb. The projection of b onto the subspace is p: 


Find p (m x 1) p = Aw = A(AT A)T! ATb. (6) 
The next formula picks out the projection matrix that is multiplying b in (6): 


Find P (m x m) PAA AA. (7) 


Compare with projection onto a line, when A has only one column: ATA is a‘ a. 


a atb atb aat 
Forn = 1 zt = —— and p=a—— and P=—: 
aTa aTa aTa 


Those formulas are identical with (5) and (6) and (7). The number aTa becomes the 
matrix ATA. When it is a number, we divide by it. When it is a matrix, we invert it. 
The new formulas contain (AT A)~! instead of 1/aTa. The linear independence of the 
columns @),...,@, will guarantee that this inverse matrix exists. 

The key step was AT (b — AZ) = 0. We used geometry (e is orthogonal to each a). 
Linear algebra gives this “normal equation” too, in a very quick and beautiful way: 


1. Our subspace is the column space of A. 
2. The error vector b — AZ is perpendicular to that column space. 
3. Therefore b — AZ is in the nullspace of AT! This means A™(b — AZ) = 0. 


The left nullspace is important in projections. That nullspace of AT contains the error 
vector e = b— AZ. The vector b is being split into the projection p and the error e = b— p. 
Projection produces a right triangle with sides p, e, and b. 


Example3 If A= 22] andb=[6 | find & and p and P. 


Solution Compute the square matrix ATA and also the vector ATb: 
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Now solve the normal equation AT A& = ATb to find @: 


> El- [o] ev 2=[]=[s). ® 


The combination p = AZ is the projection of b onto the column space of A: 


1 0 5 1 
p=5/1)]-3/1] = 2|. Theerroris e=b—p= |-2|. (9) 
1 2 —1 1 


Two checks on the calculation. First, the error e = (1, —2, 1) is perpendicular to both 
columns (1,1, 1) and (0,1, 2). Second, the matrix P times b = (6,0,0) correctly gives 
p = (5,2, —1). That solves the problem for one particular b, as soon as we find P. 

The projection matrix is P = A(A?A)~!A7. The determinant of ATA is 15 — 9 = 6; 
then (ATA)! is easy. Multiply A times (ATA)! times AT to reach P: 


1 5 2 -1 
| and i, 2 £ Zils (10) 


ee 
7 A OG 


Ta > 
(A` A) = 6\-3 3 


We must have P? = P, because a second projection doesn’t change the first projection. 


Warning The matrix P = A(A™A)~1.AT is deceptive. You might try to split (A?A)~* 
into A~! times (AT). If you make that mistake, and substitute it into P, you will find 
P = AA~!(A™)-!A?. Apparently everything cancels. This looks like P = I, the identity 
matrix. We want to say why this is wrong. 

The matrix A is rectangular. It has no inverse matrix. We cannot split ( AT A)~* into 
A`! times (AT)~+ because there is no AT! in the first place. 

In our experience, a problem that involves a rectangular matrix almost always leads to 
A! A. When A has independent columns, A! Ais invertible. This fact is so crucial that we 
state it clearly and give a proof. 


A?’ A is invertible if and only if A has linearly independent columns. 


Proof ATA is a square matrix (n by n). For every matrix A, we will now show that 
ATA has the same nullspace as A. When the columns of A are linearly independent, its 
nullspace contains only the zero vector. Then ATA, with this same nullspace, is invertible. 
Let A be any matrix. If æ is in its nullspace, then Ax = 0. Multiplying by AT gives 
AT Ag = 0. So z is also in the nullspace of AT A. 
Now start with the nullspace of A‘ A. From A‘ Az = 0 we must prove Ag = 0. 
We can’t multiply by (A‘)~+, which generally doesn’t exist. Just multiply by æT: 


(xT)ATAz=0 or (Ax)'(Ax)=0 or ||Aæ]|? =0. (11) 


We have shown: If AT Aæ = 0 then Az has length zero. Therefore Ax = 0. Every vector 
x in one nullspace is in the other nullspace. If ATA has dependent columns, so has A. 
If ATA has independent columns, so has A. This is the good case: ATA is invertible. 
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When A has independent columns, AT A is square, symmetric, and invertible. 


To repeat for emphasis: AT A is (n by m) times (m by n). Then ATA is square (n by n). 
It is symmetric, because its transpose is (AT A)T = AT (AT)T which equals ATA. We just 
proved that AT A is invertible—provided A has independent columns. Watch the difference 
between dependent and independent columns: 


At A ATA AY A ATA 
Bah aap | Bt Vs asfi J 
=> Q 

2 20 0 0 4 8 pe. ll 01 4 9 
dependent singular indep. invertible 


Very brief summary To find the projection p = ?1a1 +- -+2nan, solve ATAF = ATb. 
This gives z. The projection is p = AZ and the error is e = b — p = b — A. The 
projection matrix P = A(AT A)T! AT gives p = Pb. 

This matrix satisfies P? = P. The distance from b to the subspace C (A) is lel. 


= REVIEW OF THE KEY IDEAS =" 


1. The projection of b onto the line through a is p = a& = a(aTb/a"a). 

2. The rank one projection matrix P = aaT /aTa multiplies b to produce p. 

3. Projecting b onto a subspace leaves e = b — p perpendicular to the subspace. 
4. When A has full rank n, the equation AT A% = ATb leads to £ and p = Az. 


5. The projection matrix P = A(ATA)~! AT has PT = P and P? = P and Pb = p. 


= WORKED EXAMPLES = 


4.2 A Project the vector b = (3,4,4) onto the line through a = (2,2,1) and then 
onto the plane that also contains a* = (1,0,0). Check that the first error vector b — p 
is perpendicular to a, and the second error vector e* = b — p* is also perpendicular to a*. 

Find the 3 by 3 projection matrix P onto that plane of a and a*. Find a vector whose 
projection onto the plane is the zero vector. Why is it exactly the error e* ? 
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Solution The projection of b = (3, 4, 4) onto the line through a = (2, 2,1) is p = 2a: 


aTb 18 


Onto a line p= a= y (2 2 A= 44,2) = 2a: 


ata 


The error vector e = b — p = (—1, 0, 2) is perpendicular to a = (2, 2,1). So p is correct. 
The plane of a = (2,2,1) and a* = (1,0, 0) is the column space of A = [a a*]: 


2 1 1 0 0 
1 a 

A=|2 0 ATA=|p | aTa =a l P=|0 8 A 

1 0 0 4 2 


Now p* = Pb = (3,4.8, 2.4). The error e* = b — p* = (0, —.8, 1.6) is perpendicular to 
a and a*. This e* is in the nullspace of P and its projection is zero! Note P? = P = PT. 


4.2 B Suppose your pulse is measured at x = 70 beats per minute, then at x = 80, 
then at z = 120. Those three equations Ax = b in one unknown have AT = [1 1 1] and 
b = (70, 80, 120). The best & is the of 70, 80, 120. Use calculus and projection: 


1. Minimize E = (x — 70)? + (x — 80)? + (x — 120)? by solving dE /dz = 0. 
2. Project b = (70, 80, 120) onto a = (1,1, 1) to find = a™b/ata. 


Solution The closest horizontal line to the heights 70, 80, 120 is the average © = 90: 


dE s 70+80+120 _ 


a 2(x — 70) + 2(x — 80) + 2(a — 120) =0 gives & 3 90. 
ab (1,1,1)7(70,80,120) 70+ 80+ 120 
Also b eaiong n See a a a 
so by projection x Ta Acted, 1 3 


In recursive least squares, a fourth measurement 130 changes the average Z,)q = 90 to 
new = 100. Verify the update formula Znew = Fog + 4(130 — Fojq). When a new 
measurement arrives, we don’t have to average all the old measurements again! 


Problem Set 4.2 


Questions 1-9 ask for projections p onto lines. Also errors e = b — p and matrices P. 


1 Project the vector b onto the line through a. Check that e is perpendicular to a: 


1 1 1 —1 
(a) b= |2 and a= |1 (b) b= 13 and a= |—3 
2 1 —1 


— 
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2 Draw the projection of b onto a and also compute it from p = Za: 
cos 0 1 1 1 
(a) b= ay and a= p b) b= H and a= i . 
3 In Problem 1, find the projection matrix P = aa‘/a‘a onto the line through each 
vector a. Verify in both cases that P? = P. Multiply Pb in each case to compute 
the projection p. 
4 Construct the projection matrices P) and P> onto the lines through the a’s in Prob- 


lem 2. Is it true that (P, + Py»)? = P, + P2? This would be true if P) P> = 0. 


5 Compute the projection matrices aaT/aTa onto the lines through a, = (—1, 2,2) 
and az = (2,2, —1). Multiply those projection matrices and explain why their prod- 
uct P; Py is what it is. 


6 Project b = (1,0,0) onto the lines through a; and az in Problem 5 and also onto 
a3 = (2,—1,2). Add up the three projections p, + Pa + P3- 


7 Continuing Problems 5-6, find the projection matrix P3 onto a3 = (2, —1, 2). Verify 
that Pi + P2 + Ps = I. This is because the basis a1, a2, @3 is orthogonal! 


Questions 5—6—7: orthogonal Questions 8—9-10: not orthogonal 


8 Project the vector b = (1,1) onto the lines through a; = (1,0) and a2 = (1, 2). 
Draw the projections pı and p and add p, + py. The projections do not add to b 
because the a’s are not orthogonal. 


9 In Problem 8, the projection of b onto the plane of a; and az will equal b. Find 
P = A(ATA) A" for A = [ay az] = [44] = invertible matrix. 


10 Project a; = (1,0) onto ag = (1,2). Then project the result back onto a. Draw 
these projections and multiply the projection matrices P, P2: Is this a projection? 
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Questions 11-20 ask for projections, and projection matrices, onto subspaces. 


11 Project b onto the column space of A by solving AT Az = AT band p = AZ: 


1 1 2 1 1 4 
(a) A= |0 1 and b= |3 (ù) A=]Ī|1 1 and b= |4 
0 0 4 0 1 6 


Find e = b — p. It should be perpendicular to the columns of A. 


12 Compute the projection matrices P; and P> onto the column spaces in Problem 11. 
Verify that Pı b gives the first projection p,. Also verify P? = P2. 


13 (Quick and Recommended) Suppose A is the 4 by 4 identity matrix with its last 
column removed. A is 4 by 3. Project b = (1,2,3,4) onto the column space of A. 
What shape is the projection matrix P and what is P? 


14 Suppose b equals 2 times the first column of A. What is the projection of b onto 
the column space of A? Is P = I for sure in this case? Compute p and P when 
b = (0, 2, 4) and the columns of A are (0, 1, 2) and (1, 2, 0). 


15 IfA is doubled, then P = 2A(4A™A)~!2A7. This is the same as A(A™A)~1AT. 
The column space of 2A is the same as . Is & the same for A and 2A? 


16 What linear combination of (1,2, —1) and (1,0, 1) is closest to b = (2,1, 1)? 


17 (Important) If P? = P show that (I — P)? = I — P. When P projects onto the 
column space of A, J — P projects onto the 


18 (a) If P is the 2 by 2 projection matrix onto the line through (1,1), then J — P is 
the projection matrix onto 


(b) If P is the 3 by 3 projection matrix onto the line through (1,1,1), then J — P 
is the projection matrix onto 


19 To find the projection matrix onto the plane x — y — 2z = 0, choose two vectors in 
that plane and make them the columns of A. The plane will be the column space of 
A! Then compute P = A(ATA)~1A?. 


20 To find the projection matrix P onto the same plane x — y — 2z = 0, write down a 
vector e that is perpendicular to that plane. Compute the projection Q = eeT/eTe 
and then P = J — Q. 


Questions 21-26 show that projection matrices satisfy P? = P and PT = P. 


21 Multiply the matrix P = A(A™A)~'A? by itself. Cancel to prove that P? = P. 
Explain why P(Pb) always equals Pb: The vector Pb is in the column space of A 
so its projection onto that column space is 


22 Prove that P = A(AT A)~! AT is symmetric by computing PT. Remember that the 
inverse of a symmetric matrix is symmetric. 


A 
E: 
: 
i 
3 
a 
i 
n 
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23 


24 


25 


26 


27 


28 


29 


30 


31 


32 


If A is square and invertible, the warning against splitting (ATA)~! does not apply. 
It is true that AA~1(AT)—1 AT = I. When A is invertible, why is P = I? What is 
the error e? 


The nullspace of A? is to the column space C(A). So if ATb = O, the 
projection of b onto C(A) should be p = . Check that P = A(ATA)~!AT 
gives this answer. 


The projection matrix P onto an n-dimensional subspace of R™ has rank r = n. 
Reason: The projections Pb fill the subspace S. So S is the of P. 


If an m by m matrix has A? = A and its rank is m, prove that A = J. 


The important fact that ends the section is this: If AT Ax = 0 then Ax = 0. 
New Proof: The vector Az is in the nullspace of ____—. Aw is always in the column 
space of . To be in both of those perpendicular spaces, Ax must be zero. 


Use PT = P and P? = P to prove that the length squared of column 2 always 


equals the diagonal entry P22. This number is 2 = 2 + $ + $ for 


1 5 2 72] 
== 2 2 2 
-1 2 5 


If B has rank m (full row rank, independent rows) show that BBT is invertible. 
Challenge Problems 


(a) Find the projection matrix Pc onto the column space of A (after looking closely 


at the matrix!) 
3 6 6 
AR | 4 8 8 | 


(b) Find the 3 by 3 projection matrix Pr onto the row space of A. Multiply B = 
Pc APr. Your answer B should be a little surprising—can you explain it? 


In R”, suppose I give you b and also a combination p of a1,...,@,. How would 
you test to see if p is the projection of b onto the subspace spanned by the a’s? 


Suppose P, is the projection matrix onto the 1-dimensional subspace spanned by 
the first column of A. Suppose Pə is the projection matrix onto the 2-dimensional 
column space of A. After thinking a little, compute the product P> P}. 


A= 


O N Fr 
=.. o 
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Suppose you know the average Zola of bi, b2,..., b999. When b1000 arrives, check 
that the new average is a combination of Zoia and the mismatch b1000 — Zola: 


PI e e Ne f a O Oog 
Sye 1000 999 1000 1000 999 ; 


This is a “Kalman filter” Znew = Zold + 00 (bio00 — Zola) with gain matrix +5. 
The last page of the book extends the Kalman filter to matrix updates. 


(2017) Suppose P, and P> are projection matrices Ce? a= Pe). Prove this fact: 


P, Po is a projection matrix if and only if Pa Po = Pa P). 
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4.3 Least Squares Approximations 


1 Solving} ATA Z = ATb| gives the projection p = AZ of b onto the column space of A. 


2 When Az = b has no solution, Ẹ is the “least-squares solution”: |b — AZ||? = minimum. 


3 Setting partial derivatives of E = || Aa — b||? to zero (2 = 0) also produces AT AZ = ATb. 


4 To fit points (t1, b1),..., (tm, bm) by a straight line, A has columns (1,...,1) and (t1,..., tm). 


Liz 


TAG j ue 
5 In that case A- A is the 2 by 2 matrix St Se 


| and A’ b is the vector 


X b; 
X t;b; i 


It often happens that Aæ = b has no solution. The usual reason is: too many equations. 
The matrix A has more rows than columns. There are more equations than unknowns 
(m is greater than n). The n columns span a small part of m-dimensional space. Unless all 
measurements are perfect, b is outside that column space of A. Elimination reaches an 
impossible equation and stops. But we can’t stop just because measurements include noise! 


To repeat: We cannot always get the error e = b — Ax down to zero. When e is zero, 
x is an exact solution to Ax = b. When the length of e is as small as possible, £ is a 
least squares solution. Our goal in this section is to compute Z and use it. These are real 
problems and they need an answer. 


The previous section emphasized p (the projection). This section emphasizes & (the 
least squares solution). They are connected by p = AZ. The fundamental equation is still 
AT Az = ATb. Here is a short unofficial way to reach this “normal equation”: 


When Az = b has no solution, multiply by AT and solve A’ Az = ATb. 


Example 1 A crucial application of least squares is fitting a straight line to m points. 
Start with three points: Find the closest line to the points (0,6), (1,0), and (2,0). 


No straight line b = C + Dt goes through those three points. We are asking for two 
numbers C' and D that satisfy three equations: n = 2 and m = 3. Here are the three 
equations at t = 0, 1, 2 to match the given values b = 6, 0, 0: 


t=0 The first point is on the line b = C + Dt if C+D-0=6 
‘= 1 The second point is on the line b = C+ Dtif C+D-1=0 
t=2 The third point is on the line b = C + Dt if C+D-2=0. 
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This 3 by 2 system has no solution: b = (6,0,0) is not a combination of the columns 
(1,1, 1) and (0,1, 2). Read off A, x, and b from those equations: 


RTO C 6 

A=J1 1 = | b= |0 Aa = bis not solvable. 
D 

W2 0 
The same numbers were in Example 3 in the last section. We computed = (5, —3). 
Those numbers are the best C and D, so 5 — 3t will be the best line for the 3 points. 
We must connect projections to least squares, by explaining why AT AÈ = ATb. 

In practical problems, there could easily be m = 100 points instead of m = 3. They 

don’t exactly match any straight line C + Dt. Our numbers 6, 0, 0 exaggerate the error so 
you can see €1, €2, and e3 in Figure 4.6. 


Minimizing the Error 


How do we make the error e = b— Az as small as possible? This is an important question 
with a beautiful answer. The best x (called Z) can be found by geometry (the error 
e meets the column space of A at 90°) and by algebra: AT A& = ATb. Calculus gives the 
same & : the derivative of the error || Ax — b||? is zero at 2. 


By geometry Every Az lies in the plane of the columns (1,1,1) and (0,1,2). In that 
plane, we look for the point closest to b. The nearest point is the projection p. 


The best choice for A& is p. The smallest possible error is e = b — p, perpendicular 


to the columns. The three points at heights (pı, p2, p3) do lie on a line, because p is in the 
column space of A. In fitting a straight line, £ is the best choice for (C, D). 


By algebra Every vector b splits into two parts. The part in the column space is p. 
The perpendicular part is e. There is an equation we cannot solve (Ax = b). There is an 
equation A& = p we can and do solve (by removing e and solving AT Az = A™b): 


Az = b=p+e is impossible Ax = p is solvable g is (ATA) 1AT. (1) 
The solution to A£ = p leaves the least possible error (which is e): 
Squared length for any x || Aw — b||? = || Aa — pl? + |lell?. (2) 


This is the law c? = a? + b? for aright triangle. The vector Ax — p in the column space 
is perpendicular to e in the left nullspace. We reduce Ax — p to zero by choosing x = @. 
That leaves the smallest possible error e = (e1, €2, €3) which we can’t reduce. 

Notice what “smallest” means. The squared length of Ax — b is minimized: 


| The least squares solution È makes E = || Ax — b||? as small as possible. 


Figure 4.6a shows the closest line. It misses by distances e),e€2,e3 = 1,—2,1. 
Those are vertical distances. The least squares line minimizes E = e? + e2 + eż. 
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Figure 4.6b shows the same problem in 3-dimensional space (b pe space). The vector 
b is not in the column space of A. That is why we could not solve Ax = b. No line goes 
through the three points. The smallest possible error is the perpendicular vector e. This is 
e = b — AZ, the vector of errors (1, —2, 1) in the three equations. Those are the distances 
from the best line. Behind both figures is the fundamental equation ATA = ATb. 


column space 


b=] 0 


errors = vertical distances to line @ = (1,2. 1) 


Figure 4.6: Best line and projection: Two pictures, same problem. The line has heights 
p = (5,2, —1) with errors e = (1, —2,1). The equations A? A@ = ATb give # = (5, —3). 
Same answer! The best line is b = 5 — 3t and the closest point is p = 5a, — 3a2. 


Notice that the errors 1,—2,1 add to zero. Reason: The error e = (€1,€2,€3) is 
perpendicular to the first column (1, 1,1) in A. The dot product gives e1 + e2 + e3 = 0. 


By calculus Most functions are minimized by calculus! The graph bottoms out and the 
derivative in every direction is zero. Here the error function E to be minimized is a sum of 
squares e? + e% + e3 (the square of the error in each equation): 


B= \|Ar—b|?=(C4+D-0=67" C+D HCD. (3) 


The unknowns are C' and D. With two unknowns there are two derivatives—both zero 
at the minimum. They are “partial derivatives” because 0E /OC treats D as constant and 
OE /OD treats C as constant: 


BIC ZAC HD- 0-0 FCD 2240-9) =0 
L/D = 2AC + D-0—6)(0) +2(C +D-1)(1) + 2(C + D- 2)(2) =0. 


OE/OD contains the extra factors 0,1, 2 from the chain rule. (The last derivative from 
(C + 2D)? was 2 times C + 2D times that extra 2.) Those factors are just 1,1, 1 in 
OE/OC. 
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It is no accident that those factors 1, 1, 1 and 0, 1, 2 in the derivatives of || Aa — b||? 
are the columns of A. Now cancel 2 from every term and collect all C’s and all D’s: 


The C derivative is zero: 3C+3D=6 : A 3o T 

The D derivative is zero: 3C + 5D = 0 Thig matrix 3 5 | H 
These equations are identical with AT AÈ = ATb. The best C and D are the components 
of £. The equations from calculus are the same as the “normal equations” from linear 
algebra. These are the key equations of least squares: 


y 


The partial derivatives of, | Ax — b||? are zero when A: 


= 14. 


The solution is C = 5 and D = —3. Therefore b = 5 — 3t is the best line—it comes 
closest to the three points. Att = 0, 1, 2 this line goes through p = 5, 2, —1. 
It could not go through b = 6, 0, 0. The errors are 1, —2, 1. This is the vector e! 


The Big Picture for Least Squares 


The key figure of this book shows the four subspaces and the true action of a matrix. The 
vector x on the left side of Figure 4.3 went to b = Ag on the right side. In that figure x 
was split into x, + £n. There were many solutions to Ax = b. 

In this section the situation is just the opposite. There are no solutions to Ax = b. 
Instead of splitting up x we are splitting up b = p+ e. Figure 4.7 shows the big picture 
for least squares. Instead of Ax = b we solve A& = p. The error e = b— pis unavoidable. 


solvable for p 
p is in the column space 


TOW space 
is all of R™ 

oe 
best £ 


_ AG=p 


not solvable for b 
b is not in the column space 


0 


Independent columns 
Nullspace = {0} 


Figure 4.7: The projection p = AZ is closest to b, so minimizes E = ||b — Ax||?. 


Notice how the nullspace N(A) is very small—just one point. With independent 
columns, the only solution to Ax = 0 is x = 0. Then ATA is invertible. The equation 
AT AZ = ATb fully determines the best vector £. The error has Ate = 0. 
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Chapter 7 will have the complete picture—all four subspaces included. Every æ splits 
into £r + £n, and every b splits into p + e. The best solution is & = Z, in the row space. 
We can’t help e and we don’t want £n from the nullspace—this leaves AZ = p. 


Fitting a Straight Line 


Fitting a line is the clearest application of least squares. It starts with m > 2 points, 
hopefully near a straight line. At times tı,...,tm those m points are at heights 
bi,...,bm. The best line C + Dt misses the points by vertical distances €),...,€m. 
No line is perfect, and the least squares line minimizes E = e? + --- + e2,. 

The first example in this section had three points in Figure 4.6. Now we allow m points 
(and m can be large). The two components of £ are still C and D. 

A line goes through the m points when we exactly solve Ax = b. Generally we can’t 
do it. Two unknowns C and D determine a line, so A has only n = 2 columns. To fit the 
m points, we are trying to solve m equations (and we only have two unknowns !). 


C+ Dt; = bı | ees 
2 1 te 

An =b eA a E a a Pa (5) 
Ce Dia = bim 1 tm 


The column space is so thin that almost certainly b is outside of it. When b happens to lie 
in the column space, the points happen to lie on a line. In that case b = p. Then Aæ = b 
is solvable and the errors are e = (0,...,0). 


The closest line C + Dt has heights p,,..., Pm with errors €1,...,€m- 
Solve AT A& = ATb for = (C, D). The errors are e; = b; — C — Dti. 


Fitting points by a straight line is so important that we give the two equations AT AZ = 
ATb, once and for all. The two columns of A are independent (unless all times t; are the 
same). So we turn to least squares and solve AT A& = ATb. 


i 


Dot-product matrix ATA = k a h | 
a os tm 


m Y ti 
= ha F (6) 


1 tm 


On the right side of the normal equation is the 2 by 1 vector ATb: 


; (7) 


Les '] i Ee 


Atb= = 
l e tm, > tibi 


bm 


In a specific problem, these numbers are given. The best & = (C, D) is (ATA) 1 ATb. 
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The line C + Dt minimizes e? + --- + e2, = || Ax — b||? when ATAF = A™b: 
Mm “Ste Ch |2 (8) 
DSD IO eee 


The vertical errors at the m points on the line are the components of e = b — p. This 
error vector (the residual) b — AZ is perpendicular to the columns of A (geometry). The 
error is in the nullspace of AT (linear algebra). The best £ = (C, D) minimizes the total 
error E, the sum of squares (calculus): 


E(x) = ||Ax — b||? = (C + Dt; — b1} +--+» + (C + Dtm — bm)”. 
Calculus sets the derivatives E /ðC and ðE /ƏD to zero, and produces A? AZ = ATb. 
Other least squares problems have more than two unknowns. Fitting by the best parabola 
has n = 3 coefficients C, D, E (see below). In general we are fitting m data points 
by n parameters 71,...,%,. The matrix A has n columns and n < m. The derivatives 


of || Ax — b||? give the n equations AT Az = ATb. The derivative of a square is linear. 
This is why the method of least squares is so popular. 


ATA? = ATb 


Example 2 A has orthogonal columns when the measurement times t; add to zero. 


Suppose b = 1, 2, 4 at times t = —2, 0, 2. Those times add to zero. The columns of A have 
zero dot product: (1,1, 1)is orthogonal to (—2, 0, 2): 


C+ D(-2)=1 1-2) ro 1 
C+ D(0)=2 or Azg=i|1 0 |= 2 
C+ D(2)=4 1 2 4 
When the columns of A are orthogonal, ATA will be a diagonal matrix: 
Tax AT 3 O;|C| |7 
A- At = A°b is lo slip = lel: (9) 


Main point: Since ATA is diagonal, we can solve separately for C = i and D = S, The 
zeros in ATA are dot products of perpendicular columns in A. The diagonal matrix AT A, 
with entries m = 3 and t? + t2 + t2 = 8, is virtually as good as the identity matrix. 


Orthogonal columns are so helpful that it is worth shifting the times by subtracting the 
average time t = (tı +---+tm)/m. If the original times were 1, 3, 5 then their average is 
t = 3. The shifted times T = t — t = t — 3 add up to zero! 


Tf =1—]=3 = =2 1 Ti 3 0 
To =3-3= 0 Ajeg = iE To pee eae F J 
Ia =5-=-3= 2 1 T3 


Now C and D come from the easy equation (9). Then the best straight line uses C + DT 
which is C + D(t — t ) = C + D(t — 3). Problem 30 even gives a formula for C and D. 
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That was a perfect example of the “Gram-Schmidt idea” coming in the next section: 


Make the columns orthogonal in advance. Then A Andy is diagonal and Znew is easy. 


Dependent Columns in A: What is £ ? 


From the start, this chapter has assumed independent columns in A. Then ATA is invert- 
ible. Then AT AF = ATb produces the least squares solution to Ax = b. 
Which 2 is best if A has dependent columns? Here is a specific example. 


TRE A 


The measurements bı = 3 and b2 = 1 are at the same time T! A straight line C + Dt 
cannot go through both points. I think we are right to project b = (3,1) to p = (2,2) in 
the column space of A. That changes the equation Ax = b to the equation Az = p. 
An equation with no solution has become an equation with infinitely many solutions. 
The problem is that A has dependent columns and (1, —1) is in its nullspace. 


Which solution £ should we choose? All the dashed lines in the figure have the same 
two errors 1 and —1 at time T’. Those errors (1, —1) = e = b — pare as small as possible. 
But this doesn’t tell us which dashed line is best. 

My instinct is to go for the horizontal line at height 2. If the equation for the best line 
is b = C + Dt, then my choice will have zı = C = 2 and £o = D = 0. But what if 
the line had been written as b = ct + d? This is equally correct (just reversing C and D). 
Now the horizontal line has @; = c = 0 and Zə = d = 2. I don’t see any way out. 


In Section 7.4, the “pseudoinverse” of A will choose the shortest solution to AZ = p. 
Here, that shortest solution will be xt = (1,1). This is the particular solution in the row 
space of A, and x* has length v2. (Both solutions = (2,0) and (0,2) have length 2.) 
We are arbitrarily choosing the nullspace component of the solution x* to be zero. 

When A has independent columns, the nullspace only contains the zero vector and the 
pseudoinverse is our usual left inverse L = (A?A)~!A?. When I write it that way, the 
pseudoinverse sounds like the best way to choose æ. 


Comment MATLAB experiments with singular matrices produced either Inf or NaN 
(Not a Number) or 101° (a bad number). There is a warning in every case! I believe that Inf 
and NaN and 101° come from the possibilities Oz = b and Ox = 0 and 1077 x = 1. 

Those are three small examples of three big difficulties: singular with no solution, 
singular with many solutions, and very very close to singular. 
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Fitting by a Parabola 


If we throw a ball, it would be crazy to fit the path by a straight line. A parabola b = 
C + Dt + Et? allows the ball to go up and come down again (b is the height at time t). 
The actual path is not a perfect parabola, but the whole theory of projectiles starts with that 
approximation. 

When Galileo dropped a stone from the Leaning Tower of Pisa, it accelerated. 
The distance contains a quadratic term 5 gt”. (Galileo’s point was that the stone’s mass is 
not involved.) Without that ¢? term we could never send a satellite into its orbit. 
But even with a nonlinear function like t?, the unknowns C, D, E still appear linearly! 
Fitting points by the best parabola is still a problem in linear algebra. 


Problem Fit heights b1, ..., bm at times t,...,tm by a parabola C + Dt + Et?. 


Solution With m > 3 points, the m equations for an exact fit are generally unsolvable: 


C+ Dt, + Et =b 2 

is Ax = b with |e 
: A= > fa (10) 

; the m by 3 matrix e o t 4 

Capi ot = b. l tm tm 
Least squares The closest parabola C + Dt + Et? chooses & = (C,D,E) to 


satisfy the three normal equations AT A@ = ATb. 


May I ask you to convert this to a problem of projection? The column space of A has 
dimension ___. The projection of b is p = AZ, which combines the three columns 
using the coefficients C, D, E. The error at the first data point is e1 = b1 — C — Dt, — Et?. 
The total squared error is e? + . If you prefer to minimize by calculus, take the 
partial derivatives of F with respect to ; ; . These three derivatives will 
be zero when & = (C, D, E) solves the 3 by 3 system of equations AT A@ = ATb. 

Section 10.5 has more least squares applications. The big one is Fourier series— 
approximating functions instead of vectors. The function to be minimized changes from a 
sum of squared errors e? + --- + eĉ, to an integral of the squared error. 


Example 3 For aparabola b = C + Dt + Et? to go through the three heights b = 6,0, 0 
when t = 0, 1, 2, the equations for C, D, E are 
C+D-0+E-0°=6 
CDi te- =o (11) 
C+D-2+E-2?=0. 
This is Ax = b. We can solve it exactly. Three data points give three equations and 


a square matrix. The solution is x = (C, D, E) = (6, —9,3). The parabola through the 
three points in Figure 4.8a is b = 6 — 9t + 3¢?. 
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What does this mean for projection? The matrix has three columns, which span the 
whole space R?. The projection matrix is the identity. The projection of b is b. The error 
is zero. We didn’t need A? AZ = ATb, because we solved Ax = b. Of course we could 
multiply by AT, but there is no reason to do it. 

Figure 4.8 also shows a fourth point b, at time t4. If that falls on the parabola, the new 
Aa = b (four equations) is still solvable. When the fourth point is not on the parabola, we 
turn to A? A& = ATb. Will the least squares parabola stay the same, with all the error at 
the fourth point? Not likely! 

Least squares balances the four errors to get three equations for C, D, E. 


6 


ay pone 2 
b = 6-9 +3t sna 


Foon 
NRO 


a oe paa 


Figure 4.8: An exact fit of the parabola at t = 0,1, 2 means that p = b and e = 0. The 
fourth point (x) off the parabola makes m > n and we need least squares: project b on 
C(A). The figure on the right shows b—not a combination of the three columns of A. 


= REVIEW OF THE KEY IDEAS =" 


1. The least squares solution È minimizes || Aw — b||? = 27 AT Aw — 2x7 ATH +b’ D. 
This is Æ, the sum of squares of the errors in the m equations (m > n). 


2. The best È comes from the normal equations At Az = ATb. E is a minimum. 
3. To fit m points by a line b = C + Dt, the normal equations give C and D. 


4. The heights of the best line are p = (p1,..., Dm). The vertical distances to the data 
points are the errors e = (€1,...,€m). A key equation is Ate = 0. 


5. If we try to fit m points by a combination of n < m functions, the m equations 
Az = bare generally unsolvable. The n equations ATA®@ = ATb give the least 
squares solution—the combination with smallest MSE (mean square error). 
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= WORKED EXAMPLES =" 


4.3 A Start with nine measurements bı to bg, all zero, at times t = 1,...,9. The 
tenth measurement bio = 40 is an outlier. Find the best horizontal line y = C to fit 
the ten points (1,0), (2,0),...,(9,0), (10,40) using three options for the error E: 
(1) Least squares Ey = e? + --- + e2, (then the normal equation for C is linear) 


(2) Least maximum error Eœ = |€max| (3) Least sum of errors E = |e;| ++- + [ero]. 
Solution (1) The least squares fit to 0,0,...,0,40 by a horizontal line is C = 4: 


A= column of 1s ATA=10 A™b=sumofb; = 40. So10C = 40. 


(2) The least maximum error requires C = 20, halfway between 0 and 40. 


(3) The least sum requires C = 0 (!!). The sum of errors 9|C'] + |40 — C| would increase 
if C moves up from zero. 


The least sum comes from the median measurement (the median of 0, . . . , 0, 40 is zero). 
Many statisticians feel that the least squares solution is too heavily influenced by outliers 
like b10 = 40, and they prefer least sum. But the equations become nonlinear. 

Now find the least squares line C + Dt through those ten points (1,0) to (10, 40): 


ea-[8, E-E a e- E- 


Those come from equation (8). Then AT A% = ATb gives C = —8 and D = 24/11. 


What happens to C and D if you multiply b = (0,0,...,40) by 3 and then add 30 to 
get bnew = (30, 30,..., 150)? Linearity allows us to rescale b. Multiplying b by 3 will 
multiply C and D by 3. Adding 30 to all b; will add 30 to C. 


4.3 B Find the parabola C + Dt + Et? that comes closest (least squares error) to the 
values b = (0,0, 1, 0, 0) at the times t = —2, —1, 0, 1, 2. First write down the five equations 
Ax = b in three unknowns æ = (C, D, E) for a parabola to go through the five points. No 
solution because no such parabola exists. Solve AT A@ = ATb. 

I would predict D = 0. Why should the best parabola be symmetric around t = 0? 
In AT AẸ = ATb, equation 2 for D should uncouple from equations 1 and 3. 


Solution The five equations Ax = b have a rectangular Vandermonde matrix A: 


CSP tE N vk 0 DES, 
C+ D(-1) + E(-1)? =0 ee ea 5 0 10 
C+D @)+ E Of =1 A=/1 0 0 A'TA=| 0 10 0 
C+D (I)+E 07 =0 TAPSIN 10 0 34 
C+D (D +E (2)? =0 E ae 
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Those zeros in AT A mean that column 2 of A is orthogonal to columns 1 and 3. We 
see this directly in A (the times —2, —1, 0, 1, 2 are symmetric). The best C, D, E in the 
parabola C + Dt + Et? come from A? A% = A™b, and D is uncoupled from C and E: 


5 0 10 C 1 C = 34/70 

0 10 0 D | =| 0 leads to D=0 as predicted 

10 0 34 E 0 E = —10/70 
Problem Set 4.3 


Problems 1-11 use four data points b = (0,8, 8, 20) to bring out the key ideas. 


b = (0,8,8, 20) 


e x 
N 


‘yp = Ca, + Daz 


az = (0,1,3,4) 
aı = (1,1,1,1) 


t=0 =l LSS 4,54 


Figure 4.9: Problems 1-11: The closest line C + Dt matches Ca; + Dag in Rê. 


1 With b = 0,8,8,20 att = 0,1,3,4, set up and solve the normal equations 
A™Az = ATb. For the best straight line in Figure 4.9a, find its four heights p; 
and four errors e;. What is the minimum value E = e? + ež + e2 + e3? 


2 (Line C + Dt does go through p’s) With b = 0,8,8,20 at times t = 0,1,3,4, 
write down the four equations Ax = b (unsolvable). Change the measurements to 
p = 1,5, 13, 17 and find an exact solution to A£ = p. 


3 Check that e = b — p = (—1,3,—5,3) is perpendicular to both columns of the 
same matrix A. What is the shortest distance ||e|| from b to the column space of A? 


4 (By calculus) Write down E = ||Ax — b||? as a sum of four squares—the last one 
is (C + 4D — 20)?. Find the derivative equations 0F/OC = 0 and OE /0D = 0. 
Divide by 2 to obtain the normal equations AT A& = ATb. 


5 Find the height C of the best horizontal line to fit b = (0,8,8,20). An exact fit 
would solve the unsolvable equations C = 0, C = 8, C = 8, C = 20. Find the 
4 by 1 matrix A in these equations and solve ATAF = ATb. Draw the horizontal 
line at height x = C and the four errors in e. 
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Project b = (0,8, 8, 20) onto the line through a = (1,1,1,1). Find = a?b/ata 
and the projection p = Za. Check that e = b — pis perpendicular to a, and find the 
shortest distance ||e|| from b to the line through a. 


Find the closest line b = Dt, through the origin, to the same four points. An exact 
fit would solve D -0 = 0,D-1=8,D-3=8, D-4 = 20. Find the 4 by 1 matrix 
and solve AT AF = ATb. Redraw Figure 4.9a showing the best line b = Dt and the 
e’s. 

Project b = (0,8, 8, 20) onto the line through a = (0,1,3,4). Find = D and 
p = Ta. The best C in Problems 5-6 and the best D in Problems 7-8 do not agree 
with the best (C, D) in Problems 1—4. That is because (1, 1, 1, 1) and (0, 1, 3, 4) are 

perpendicular. 


For the closest parabola b = C + Dt + Et? to the same four points, write down the 
unsolvable equations Ax = b in three unknowns æ = (C, D, E). Set up the three 
normal equations AT A£ = A‘b (solution not required). In Figure 4.9a you are now 
fitting a parabola to 4 points—what is happening in Figure 4.9b? 


For the closest cubic b = C + Dt + Et? + Ft? to the same four points, write down 
the four equations Ax = b. Solve them by elimination. In Figure 4.9a this cubic 
now goes exactly through the points. What are p and e? 


The average of the four times is t = +(0 + 1 +3 +4) = 2. The average of the 
four b’s isb = 4(0+8 +8 + 20) = 9. 


(a) Verify that the best line goes through the center point (t, b) = (2,9). 
(b) Explain why C + Dt = b comes from the first equation in A? A& = ATb. 


Questions 12-16 introduce basic ideas of statistics—the foundation for least squares. 


12 


13 


14 


(Recommended) This problem projects b = (bi,..., bm) onto the line through a = 
(1,..., 1). We solve m equations az = b in 1 unknown (by least squares). 


T 


(a) Solve aTa? = a‘b to show that Ẹ is the mean (the average) of the b’s. 


(b) Find e = b — aî and the variance |le||? and the standard deviation |je||. 


(c) The horizontal line b = 3 is closest to b = (1, 2,6). Check that p = (3, 3, 3) is 
perpendicular to e and find the 3 by 3 projection matrix P. 


First assumption behind least squares: Ax = b— (noise e with mean zero). Multiply 
the error vectors e = b — Az by (A? A)~! AT to get & — x on the right. The 
estimation errors & — x also average to zero. The estimate £ is unbiased. 


Second assumption behind least squares: The m errors e; are independent with vari- 
ance a”, so the average of (b — Ax)(b — Ax)? is o?J. Multiply on the left by 
(AT A)~1A™ and on the right by A(A1A)~! to show that the average matrix 
(© — x)(& — x)" is o? (AT A)—!. This is the covariance matrix W in Section 10.2. 


| 
L 
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15 


16 


A doctor takes 4 readings of your heart rate. The best solution to z = b1,..., £ = b4 
is the average T of b,,...,b4. The matrix A is a column of 1’s. Problem 14 gives 
the expected error ( — x)? as o?°( ATA)! = . By averaging, the variance 


drops from o? to o° /4. 


If you know the average Tg of 9 numbers b),..., 59, how can you quickly find the 
average £19 with one more number big? The idea of recursive least squares is to 
avoid adding 10 numbers. What number multiplies £g in computing 710 ? 


Tio = $ bio a Tg = a5 (b1 +--+- + 6,9) asin Worked Example 4.2 C. 


Questions 17-24 give more practice with & and p and e. 


17 


18 


19 


20 


21 


22 


23 


24 


25 


Write down three equations for the line b = C + Dt to go through b = 7 att = —1, 
b= 7 at t = 1, and b = 21 att = 2. Find the least squares solution & = (C, D) and 
draw the closest line. 


Find the projection p = AZ in Problem 17. This gives the three heights of the closest 
line. Show that the error vector is e = (2, —6, 4). Why is Pe = 0? 


Suppose the measurements at ¢ = —1,1, 2 are the errors 2, —6, 4 in Problem 18. 
Compute £ and the closest line to these new measurements. Explain the answer: 
b = (2, —6, 4) is perpendicular to so the projection is p = 0. 

Suppose the measurements at t = —1,1,2 are b = (5,13,17). Compute & and the 


closest line and e. The error is e = O because this b is 


Which of the four subspaces contains the error vector e? Which contains p? Which 
contains £? What is the nullspace of A? 


Find the best line C + Dt to fit b = 4,2, —1,0,0 at times t = —2, —1, 0, 1, 2. 


Is the error vector e orthogonal to b or p or e or È? Show that |/e||? equals eTb 
which equals b'b — pb. This is the smallest total error E. 


The partial derivatives of || Aa||? with respect to 71,..., £n fill the vector 2AT Ag. 
The derivatives of 2b" Az fill the vector 2ATb. So the derivatives of || Ax — b||? are 
zero when 


Challenge Problems 


What condition on (t1, b1), (t2, b2), (t3, b3) puts those three points onto a straight 
line ? A column space answer is: (b1, b2, b3) must be a combination of (1, 1, 1) and 
(tı, t2, t3). Try to reach a specific equation connecting the t’s and b’s. I should have 
thought of this question sooner! 


232 


26 


27 


28 


29 


30 


Chapter 4. Orthogonality 


Find the plane that gives the best fit to the 4 values b = (0,1,3,4) at the corners 
(1, 0) and (0, 1) and (—1, 0) and (0, —1) of a square. The equations C + Dz + Ey = 
b at those 4 points are Aw = b with 3 unknowns æ = (C,D,E). What is A? 
At the center (0, 0) of the square, show that C + Da + Ey = average of the b’s. 


(Distance between lines) The points P = (z, x, x) and Q = (y,3y, —1) are on two 
lines in space that don’t meet. Choose x and y to minimize the squared distance 
|| P — Q||?. The line connecting the closest P and Q is perpendicular to 


Suppose the columns of A are not independent. How could you find a matrix B so 
that P = B(B™B)~B?™ does give the projection onto the column space of A? (The 
usual formula will fail when AT A is not invertible.) 


Usually there will be exactly one hyperplane in R” that contains the n given points 
x = 0,a4,...,@n—1. (Example for n = 3: There will be one plane containing 
0, a1, a2 unless .) What is the test to have exactly one plane in R”? 


Example 2 shifted the times t; to make them add to zero. We subtracted away the 
average time t = (tı +---+tm)/m to get T; = t; — t. Those T; add to zero. 


With the columns (1,..., 1) and (T4, . . . , Tm) now orthogonal, AT A is diagonal. Its 
entries are m and T? + --- + T2. Show that the best C and D have direct formulas: 
z bieen bye ee ba Tn 
Tist—t Ga | pee 
m Tý eo T2 
The best line is C + DT or C + D(t — t). The time shift that makes AT A diagonal 
is an example of the Gram-Schmidt process: orthogonalize the columns of A in 
advance. 
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4.4 Orthonormal Bases and Gram-Schmidt 


1 The columns q1, .. . , qn are orthonormal if q; 4; = { eee a i \ Then| QTQ =I. 


2 If Q is also square, then QQT = I and Q is an “orthogonal matrix”. 


3 The least squares solution to Qx = b is Z = QTb. Projection of b: p = QQTb = Pb. 


4 The Gram-Schmidt process takes independent a; to orthonormal q;. Start with gq, =a1/ ||a1\|. 
5 q;is (a; — projection p;) / ||a; — p;l|; projection p; = (a. q1)qı + +--+ (aF qi—1)di-1: 
6 Each a; will be a combination of q; to q;. Then A = QR: orthogonal Q and triangular R. j 


This section has two goals, why and how. The first is to see why orthogonality is good. 
Dot products are zero, so AT A will be diagonal. It becomes so easy to find £ and p = Az. 
The second goal is to construct orthogonal vectors. You will see how Gram-Schmidt 
chooses combinations of the original basis vectors to produce right angles. Those original 
vectors are the columns of A, probably not orthogonal. The orthonormal basis vectors 
will be the columns of a new matrix Q. 


From Chapter 3, a basis consists of independent vectors that span the space. 
The basis vectors could meet at any angle (except 0° and 180°). But every time we visu- 
alize axes, they are perpendicular. In our imagination, the coordinate axes are practically 
always orthogonal. This simplifies the picture and it greatly simplifies the computations. 


The vectors q4, . . . , q, are orthogonal when their dot products q; - q; are zero. More 
exactly q7 4; = 0 whenever i Æ j. With one more step—just divide each vector by its 
length—the vectors become orthogonal unit vectors. Their lengths are all 1 (normal). 
Then the basis is called orthonormal. 


DEFINITION The vectors q,,...,q,, are orthonormal if 


q:q,; = 0 wheni#j (orthogonal vectors) 
ee 1 wheni=¥J (unit vectors: ||q;|| = 1) 


A matrix with orthonormal columns is assigned the special letter Q. 


The matrix Q is easy to work with because QTQ = I. This repeats in matrix lan- 
guage that the columns q,,...,q,, are orthonormal. Q is not required to be square. 
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A matrix Q with orthonormal columns satisfies QTQ = I : 


qi 1 0 0 

z R |= ie. O01 ++ 0 

QTQ = HE Gigs Mae NE Ti A) 
qn | 0 1 


When row i of QT multiplies column j of Q, the dot product is q} qj. Off the diagonal 
(i A j) that dot product is zero by orthogonality. On the diagonal (i = 7) the unit vectors 
give qi q; = ||q;\|° = 1. Often Q is rectangular (m > n). Sometimes m = n. 


When Q is square, Q'Q = I means that Q'= Qt: transpose = inverse. 


If the columns are only orthogonal (not unit vectors), dot products still give a diagonal 
matrix (not the identity matrix). This diagonal matrix is almost as good as J. The important 
thing is orthogonality—then it is easy to produce unit vectors. 

To repeat: QTQ = I even when Q is rectangular. In that case QT is only an inverse 
from the left. For square matrices we also have QQ? = I, so QT is the two-sided in- 
verse of Q. The rows of a square Q are orthonormal like the columns. The inverse is the 
transpose. In this square case we call Q an orthogonal matrix. ' 

Here are three examples of orthogonal matrices—rotation and permutation and reflec- 
tion. The quickest test is to check QTQ = I. 


Example 1 (Rotation) Q rotates every vector in the plane by the angle 0: 


cos —sin@ 


sin 0 cos 0 —sin cosdé 


2 | | M oT- o- = cos 0 a f 


The columns of Q are orthogonal (take their dot product). They are unit vectors because 
sin? 6 + cos? 6 = 1. Those columns give an orthonormal basis for the plane R°. 

The standard basis vectors 7 and j are rotated through 0 (see Figure 4.10a). Q~ rotates 
vectors back through —@. It agrees with QT, because the cosine of —@ equals the cosine 
of ð, and sin(—@) = — sin. We have QTQ = I and QQ? =I. 


Example 2 (Permutation) These matrices change the order to (y, z, x) and (y, x): 


0 1 0) le Y 
o 0 1||yļ=]|z| and F THESE 
1 0 0 Z T Y 7 


All columns of these Q’s are unit vectors (their lengths are obviously 1). They are also 
orthogonal (the 1’s appear in different places). The inverse of a permutation matrix is its 
transpose: Q71 = QT. The inverse puts the components back into their original order: 


'Orthonormal matrix” would have been a better name for Q, but it’s not used. Any matrix with 
orthonormal columns has the letter Q. But we only call it an orthogonal matrix when it is square. 
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Inverse = transpose: 


ore © 
= = oo 
O o e 
ae 

Il 
ne gs 

fe) 

j=) 

a 
E.. 
=. © 
or 
SS | 
a | 

Kay 
SS 

| 
| (iti! | 
ec 8 
Es | 


Every permutation matrix is an orthogonal matrix. 


Example 3 (Reflection) If u is any unit vector, set Q = I — 2uu™. Notice that 
uu? is a matrix while uu is the number ||u||? = 1. Then QT and Q7! both equal Q: 


QT J 2uu =Q and QTQ =I — 4uu" + 4uutuul = I. (2) 
Reflection matrices J — 2uu? are symmetric and also orthogonal. If you square them, you 


get the identity matrix: Q? = QTQ = I. Reflecting twice through a mirror brings back the 
original, like (—1)? = 1. Notice ut u = 1 inside 4uu? wu? in equation (2). 


j Qi =j A 
= |= sim u e mirfor 
Qj = er Qi = ee 7. at 
sin 0 Pa 
Rotate by 6 Reflect "he 
i u to —u » Qj =i 


Figure 4.10: Rotation by Q = |. z. and reflection across 45° by Q = k ale 


As example choose the direction u = (—1/./2, 1/./2). Compute 2uu™ (column times 
row) and subtract from J to get the reflection matrix Q in the direction of u: 


À E EN oe 0 1) fz) _ fy 
Reflection Q = J -2 1 J = f J and i l H = 4 
When (x, y) goes to (y, £), a vector like (3, 3) doesn’t move. It is on the mirror line. 


Rotations preserve the length of every vector. So do reflections. So do permutations. 
So does multiplication by any orthogonal matrix (Q—lengths and angles don’t change. 


Proof ||Qz||? equals ||a||? because (Qz)T (Qz) = a2? QTQa2 = atIa2 = g'z. 
If Q has orthonormal columns (QTQ = T), it leaves lengths unchanged: 
Same length for Qa ||Qaz|| = ||x|| for every vector z. (3) 


Q also preserves dot products: (Qx)T (Qy) = xTQTQy =a y. Just use QTQ=TI! 
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Projections Using Orthonormal Bases: Q Replaces A 


Orthogonal matrices are excellent for computations—numbers can never grow too large 
when lengths of vectors are fixed. Stable computer codes use Q’s as much as possible. 

For projections onto subspaces, all formulas involve AT A. The entries of AT A are the 
dot products a} a; of the basis vectors aj,...,@n. 

Suppose the basis vectors are actually orthonormal. The a’s become the q’s. Then 
ATA simplifies to QTQ = I. Look at the improvements in & and p and P. Instead of 
QTQ we print a blank for the identity matrix: 

Z=QTb and p=Q# and P=Q__Q’. (4) 


The least squares solution of Qx = bis & = Q™b. The projection matrix is QQT. 


There are no matrices to invert. This is the point of an orthonormal basis. The best @ = 
QTb just has dot products of g,,...,q,, with b. We have 1-dimensional projections! 
The “coupling matrix” or “correlation matrix” ATA is now QTQ = J. There is no 
coupling. When A is Q, with orthonormal columns, here is p = Q@ = QQTb: 


Projection : 
onto q’s p = a aoe 91 (41 b) + +++ + 4n(4n8). 
| J Lato 
(5) 
Important case: When Q is square and m = n, the subspace is the whole space. Then 
QT = Q`! and & = QTb is the same as x = Q7!b. The solution is exact! The projection 


of b onto the whole space is b itself. In this case p = b and P = QQT =I. 


You may think that projection onto the whole space is not worth mentioning. But when 
p = b, our formula assembles b out of its 1-dimensional projections. If q,,...,q,, is an 
orthonormal basis for the whole space, then Q is square. Every b = QQTb is the sum of 
its components along the q’s: 


b = q,(q; 6) + q3(q3 6) +--+ qn (97b). (6) 


Transforms QQ" = I is the foundation of Fourier series and all the great “transforms” 
of applied mathematics. They break vectors b or functions f(x) into perpendicular pieces. 
Then by adding the pieces in (6), the inverse transform puts b and f(x) back together. 


Example 4 The columns of this orthogonal Q are orthonormal vectors q1, q2, q3: 


m=n=3 Q=;5 2 —1 2 has QTQ = QQT =I. 
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The separate projections of b = (0,0, 1) onto q; and q, and q are p, and p and ps: 
qi(q16) =q, and qo(q3b) = Fa and q3(q3b) = —4q3. 


The sum of the first two is the projection of b onto the plane of q, and q>. The sum of all 
three is the projection of b onto the whole space—which is p4 + py + pz = b itself: 


Reconstruct b 2 i 2 L poe : A 
= adı T 59927 493-6 —=4 = = = b. 
b = Pı + P2 + P3 3 3 3 9 jaa : 


The Gram-Schmidt Process 


The point of this section is that “orthogonal is good”. Projections and least squares 
always involve A? A. When this matrix becomes QTQ = J, the inverse is no problem. 
The one-dimensional projections are uncoupled. The best £ is QTb (just n separate dot 
products). For this to be true, we had to say “Jf the vectors are orthonormal”. 
Now we explain the “Gram-Schmidt way” to create orthonormal vectors. 


Start with three independent vectors a,b,c. We intend to construct three orthogonal 
vectors A,B,C. Then (at the end may be easiest) we divide A, B, C by their lengths. 
That produces three orthonormal vectors q, = A/||Al|, q2 = B/||Bl|, q3 = C/C]. 


Gram-Schmidt Begin by choosing A = a. This first direction is accepted as it comes. 
The next direction B must be perpendicular to A. Start with b and subtract its 
projection along A. This leaves the perpendicular part, which is the orthogonal vector B: 


First Gram-Schmidt step B=b- A. (7) 


A and B are orthogonal in Figure 4.11. Multiply equation (7) by A? to verify that ATB = 
ATb — Ab = 0. This vector B is what we have called the error vector e, perpendicular 
to A. Notice that B in equation (7) is not zero (otherwise a and b would be dependent). 
The directions A and B are now set. 

The third direction starts with c. This is not a combination of A and B (because c is 
not a combination of a and b). But most likely c is not perpendicular to A and B. So 
subtract off its components in those two directions to get a perpendicular direction C: 


Next Gram-Schmidt step C=c-—-——A- ——B. (8) 


This is the one and only idea of the Gram-Schmidt process. Subtract from every new 
vector its projections in the directions already set. That idea is repeated at every step.” 
If we had a fourth vector d, we would subtract three projections onto A, B,C to get D. 


?I think Gram had the idea. I don’t really know where Schmidt came in. 
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C 


43 = 1G] 


Subtract the 
projection p 


to get B =b—p 


Unit vectors 


A projectionp *41 = JA] 
A=a b 


Figure 4.11: First project b onto the line through a and find the orthogonal B as b — p. 
Then project c onto the AB plane and find C as c — p. Divide by || A||, |B|, IC]. 


At the end, or immediately when each one is found, divide the orthogonal vectors A, B, 
C, D by their lengths. The resulting vectors q4, qs, q3, q4 are orthonormal. 


Example of Gram-Schmidt Suppose the independent non-orthogonal vectors a, b, c are 


1 2 3 
a= |-l and b= 0 and c= |-—3 
0 —2 3 


Then A = a has ATA = 2 and ATb = 2. Subtract from b its projection p along A: 


T 1 
First step B=b- Í A=b-7A= 1 
—2 


Check: ATB = 0 as required. Now subtract the projections of c on A and B to get C: 


A'e Be 
ATA BTB 


B=c—3A+7B= 


Next step Cae 6 


pot pi jmi 


Check: C = (1,1,1) is perpendicular to both A and B. Finally convert A, B,C to 
unit vectors (length 1, orthonormal). The lengths of A, B, C are v2 and V6 and v3. 
Divide by those lengths, for an orthonormal basis: 


1 1 1 
Sus —1 and ve: 1 and = ao 1 
qi /2 r q2 V6 e d3 v3 1 


Usually A, B,C contain fractions. Almost always q1, q2, q contain square roots. 
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The Factorization A = QR 


We started with a matrix A, whose columns were a,b,c. We ended with a matrix Q, 
whose columns are q1, q2, q3. How are those matrices related? Since the vectors a, b, c 
are combinations of the q’s (and vice versa), there must be a third matrix connecting A 
to Q. This third matrix is the triangular Rin A = QR. 

The first step was q4 = a/||a|| (other vectors not involved). The second step was 
equation (7), where b is a combination of A and B. At that stage C and q} were not 
involved. This non-involvement of later vectors is the key point of Gram-Schmidt: 


e The vectors a and A and q; are all along a single line. 
¢ The vectors a,b and A, B and q}, qù are all in the same plane. 
¢ The vectors a, b,c and A, B,C and q1, q2, q3 are in one subspace (dimension 3). 


At every step aj,...,@, are combinations of q,,...,q,. Later q’s are not involved. 
The connecting matrix R is triangular, and we have A = QR: 


qia qib qřc 
abo- a Qt g qzb gjc| or A=QR. (9) 


q3c 
A = QR is Gram-Schmidt in a nutshell. Multiply by QT to recognize R = QTA above. 


(Gram-Schmidt) From independent vectors @1,...,@n, Gram-Schmidt constructs 
orthonormal vectors g,,...,q@,,- The matrices with these columns satisfy A = QR. 
Then R = QTA is upper triangular because later q’s are orthogonal to earlier a’s. 


Here are the original a’s and the final q’s from the example. The i, j entry of R = QTA 
is row i of QT times column j of A. The dot products q7 a; go into R. Then A = QR: 


i 2.8 iyo i176 1/73) [v2 v2 18 
A=/-1 0 =3|=/-1/v2 1/V6 1/V3||0 V6 -V6| =QR. 
Ce eee 0 =e i1/73|/ 0 @ V2 


Look closely at Q and R. The lengths of A, B,C are V2, V6, V3 on the diagonal of R. 
The columns of Q are orthonormal. Because of the square roots, QR might look harder 
than LU. Both factorizations are absolutely central to calculations in linear algebra. 

Any m by n matrix A with independent columns can be factored into A = QR. The 
m by n matrix Q has orthonormal columns, and the square matrix R is upper triangular 
with positive diagonal. We must not forget why this is useful for least squares: 
ATA = (QR)'QR = RTQTQR = R'R. The least squares equation AT AZ = 
A™b simplifies to RT RZ = RTQTb. Then finally we reach RẸ = QTb: good. 
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Least squares RTR? = R*Q*b or R = QTb or Z=R QTd (10) 


Instead of solving Ax = b, which is impossible, we solve Ra = QTb by back substitu- 
tion—which is very fast. The real costis the mn? multiplications in the Gram-Schmidt pro- 
cess, which are needed to construct the orthogonal Q and the triangular R with A = QR. 


Below is an informal code. It executes equations (11) for 7 = 1 then 7 = 2 and eventually 
j = n. The important lines 4-5 subtract from v = a; its projection onto each q;,7 < j. 
The last line of that code normalizes v (divides by rjj = ||v||) to get the unit vector q;: 


= m 1/2 
VU: 5 
Pej = X diktij and Vij = Vij — likt kj and Tjj = (È of) and dij = a (11) 


i=1 i=1 


Starting from a, b, c = a1, a2, ag this code will construct q,, then B, q», then C, q3: 
qı = 41/||a;|| B=a2~-(qia2)q, 4 = B/||B|| 
C* =a3—(qtas)q, C=C*—(q7C")aq, q= C/C 


Equation (11) subtracts one projection at a time as in C* and C'. That change is called 
modified Gram-Schmidt. This code is numerically more stable than equation (8) which 
subtracts all projections at once. 


for j= ln % modified Gram-Schmidt 
u= AlI) % v begins as column j of the original A 
fori =1:J—1 % columns q; to q;j—; are already settled in Q 


R(i, j) = Q(:, i} eo; % compute R;; = qla; which is gq} v 
v =v—-R(i,j)*Q(:,1); % subtract the projection (q7 v)q; 


end % v is now perpendicular to all of q4, .. - , qj—1 
R(j, j) = norm(v); % the diagonal entries R}; are lengths 
QC j) = v/ RG, j); % divide v by its length to get the next q; 

end % the “for j = T : m loop” produces all of the g; 


To recover column j of A, undo the last step and the middle steps of the code: 


j—1 
R(j,j)q; = (v minus its projections) = (column j of A) — > R(i, j)qi. (12) 
i=1 


Moving the sum to the far left, this is column j in the multiplication QR = A. 
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Confession Good software like LAPACK, used in good systems like MATLAB and 
Julia and Python, will not use this Gram-Schmidt code. There is now a better way. 
“Householder reflections” act on A to produce the upper triangular R. This happens one 
column at a time in the same way that elimination produces the upper triangular U in LU. 

Those reflection matrices J — 2uu" will be described in Chapter 11 on numerical linear 
algebra. If A is tridiagonal we can simplify even more to use 2 by 2 rotations. The result 
is always A = QR and the MATLAB command to orthogonalize A is [Q, R] = qr(A). 
I believe that Gram-Schmidt is still the good process to understand, even if the reflections 
or rotations lead to a more perfect Q. 


= REVIEW OF THE KEY IDEAS = 


1. If the orthonormal vectors q,,...,q,, are the columns of Q, then qi} 4; = 0 and 
q; q; = 1 translate into the matrix multiplication QTQ = I. 


. If Q is square (an orthogonal matrix) then QT = Q7!: transpose = inverse. 
. The length of Qa equals the length of x: ||Qa|| = ||a||. 


2 

3 

4. The projection onto the column space of Q spanned by the q’s is P = QQ. 
5. If Q is square then P = QQ? = I and every b = q,(qib) +---+4,,(q}b). 
6 


. Gram-Schmidt produces orthonormal vectors q,,q2,q3 from independent a, b, c. 
In matrix form this is the factorization A = QR = (orthogonal Q)(triangular R). 


= WORKED EXAMPLES = 


4.4 A Add two more columns with all entries 1 or —1, so the columns of this 4 by 4 
“Hadamard matrix” are orthogonal. How do you turn H, into an orthogonal matrix Q? 


1 1 


1 1 
1 —1 
P = 4 
1 -1 


E E E f d is the next Hadamard matrix with 1’s and —1’s. 


Hy, —Hy, What is the product Hi Hg? 


The projection of b = (6,0,0,2) onto the first column of H4 is p} = (2,2,2,2). The 
projection onto the second column is py = (1, —1, 1, —1). What is the projection p, » of b 
onto the 2-dimensional space spanned by the first two columns? 
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Solution H; canbe built from H2 just as Hs is built from H4: 


1 ££ 2 
_ | Ae Hə) |1 -1 21 - 
H; = | H = A = fafi has orthogonal columns. 
1 —1 -1 1 


Then Q = H/2 has orthonormal columns. Dividing by 2 gives unit vectors in Q. A 5 by 
5 Hadamard matrix is impossible because the dot product of columns would have five 1’s 
and/or —1’s and could not add to zero. Hg has orthogonal columns of length //8. 


T T T H 
Hg He = |F | (H se a 0 lela 0 Hs 


HT -Hu -uN | 0 2HTH 0 M = 


4.4B What is the key point of orthogonal columns? Answer: ATA is diagonal and 
easy to invert. We can project onto lines and just add. The axes are orthogonal. 


Add p’s Projection p; > onto a plane equals p; + po onto orthogonal lines. 


Problem Set 4.4 


Problems 1-12 are about orthogonal vectors and orthogonal matrices. 


1 Are these pairs of vectors orthonormal or only orthogonal or only independent? 


© ffi] © (J [4] © jete 


Change the second vector when necessary to produce orthonormal vectors. 


2 The vectors (2, 2, —1) and (—1, 2, 2) are orthogonal. Divide them by their lengths to 
find orthonormal vectors g, and q». Put those into the columns of Q and multiply 


QTQ and QQ?. 
3 (a) If A has three orthogonal columns each of length 4, what is AT A? 
(b) If A has three orthogonal columns of lengths 1, 2,3, what is A? A? 


4 Give an example of each of the following: 


(a) A matrix Q that has orthonormal columns but QQ? + I. 
(b) Two orthogonal vectors that are not linearly independent. 
(c) An orthonormal basis for RË, including the vector q, = (1,1, 1)/v3. 


5 Find two orthogonal vectors in the plane x + y + 2z = 0. Make them orthonormal. 


6 If Qı and Qə are orthogonal matrices, show that their product Q1 Qə is also an or- 
thogonal matrix. (Use QTQ = I.) 
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7 If Q has orthonormal columns, what is the least squares solution £ to Qa = b? 


8 If q; and qə are orthonormal vectors in R, what combination —— qı + —— 42 
is closest to a given vector b? 


9 (a) Compute P = QQT when q, = (.8,.6,0) and q> = (—.6,.8,0). Verify that 
P? =P. 
(b) Prove that always (QQ?)? = QQT by using QTQ = I. Then P = QQ? is 
the projection matrix onto the column space of Q. 


10 Orthonormal vectors are automatically linearly independent. 


(a) Vector proof: When c1q1 +¢c2qG.+¢3q3 = 0, what dot product leads to cı = 0? 
Similarly cg = 0 and c3 = 0. Thus the q’s are independent. 


(b) Matrix proof: Show that Qa = 0 leads to x = 0. Since Q may be rectangular, 
you can use QT but not Q7!. 


11 (a) Gram-Schmidt: Find orthonormal vectors q, and qə in the plane spanned by 
a = (1,3,4,5, 7) and b = (—6, 6, 8,0, 8). 


(b) Which vector in this plane is closest to (1, 0, 0, 0, 0)? 


12 = If ay, a2, a3 is a basis for R, any vector b can be written as 
b = £1đa1 + £202 + 1343 or a a, Q3 z2 | =b. 


(a) Suppose the a’s are orthonormal. Show that zı = afb. 
(b) Suppose the a’s are orthogonal. Show that zı = a? b/atay. 


(c) If the a’s are independent, x, is the first component of times b. 


Problems 13-25 are about the Gram-Schmidt process and A = QR. 


13 What multiple of a = [+] should be subtracted from b = Fa to make the result B 
orthogonal to a? Sketch a figure to show a, b, and B. 


14 Complete the Gram-Schmidt process in Problem 13 by computing q; = a/||a|| and 
qə = B/||B|| and factoring into QR: 


f of = [a a 3! isil 
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15 


16 


17 


18 


19 


20 


21 
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(a) Find orthonormal vectors q,, q2, q3 such that q,, q span the column space of 


1 1 
A=| 2 -l 
—2 A 


(b) Which of the four fundamental subspaces contains q3? 


(c) Solve Aæ = (1, 2,7) by least squares. 


What multiple of a = (4,5,2,2) is closest to b = (1,2,0,0)? Find orthonormal 
vectors q; and qə in the plane of a and b. 


Find the projection of b onto the line through a: 


1 1 
a= |1 and b= |3 and p=? and e=b-p=? 
1 5 


Compute the orthonormal vectors q, = a/||a|| and q> = e/|ļe||. 

(Recommended) Find orthogonal vectors A, B, C by Gram-Schmidt from a, b, c: 
a = (1, —1, 0,0) b = (0, 1, —1,0) c = (0,0,1,—1). 

A, B,C and a,b, c are bases for the vectors perpendicular to d = (1,1, 1, 1). 

If A = QR then ATA = RTR = _____ triangular times ____ triangular. 


Gram-Schmidt on A corresponds to elimination on AT A. The pivots for AT A must 
be the squares of diagonal entries of R. Find Q and R by Gram-Schmidt for this A: 


Me 
r r, fə 9] fı 0) f9 L4 
Ae > ang iuel ak i Al 1] 


True or false (give an example in either case): 


(a) QT! is an orthogonal matrix when Q is an orthogonal matrix. 


(b) If Q (3 by 2) has orthonormal columns then ||Qa|| always equals ||z\|. 


Find an orthonormal basis for the column space of A: 
1 
1 
1 
1 


Then compute the projection of b onto that column space. 
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22 Find orthogonal vectors A, B, C by Gram-Schmidt from 


1 1 1 
a= jl and b= |-1 and c= |0 
2 0 4 


23 Find q,,q5,q3 (orthonormal) as combinations of a, b,c (independent columns). 
Then write A as QR: 


1 2 4 
A=]|O 0 5 
0 3 6 
24 (a) Find a basis for the subspace S in Rf spanned by all solutions of 
£1 + T2 + £3 — £4 = 0. 
(b) Find a basis for the orthogonal complement S z 
(c) Find bı in S and bə in S* so that bı + bə = b = (1,1,1, 1). 
25 Ifad-— bc > 0, the entriesin A = QR are 


a —c| fa*+c? ab+cd 
|c a 0 ad — be 


a b 
f i Vate Vere 
Write A = QR when a,b,c,d = 2,1,1,1 and also 1,1,1,1. Which entry of R 
becomes zero when the columns are dependent and Gram-Schmidt breaks down? 


Problems 26-29 use the Q.R code in equation (11). It executes Gram-Schmidt. 
26 Show why C (found via C™ in the steps after (11)) is equal to C in equation (8). 


27 Equation (8) subtracts from c its components along A and B. Why not subtract the 
components along a and along b? 


28 Where are the mn? multiplications in equation (11)? 


29 Apply the MATLAB qr code to a = (2, 2,—1), b = (0, —3,3),e = (1,0,0). What 
are the q’s? 


Problems 30-35 involve orthogonal matrices that are special. 


30 The first four wavelets are in the columns of this wavelet matrix W: 


1 1 y2 0 
11 1 =-v2 0 
AES 0 v2 
E 0 -Vv2 


What is special about the columns? Find the inverse wavelet transform W71. 


W= 
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31 


32 


33 
34 


35 


36 


37 
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(a) Choose c so that Q is an orthogonal matrix: 


i es ee | 
aap) -=j -=i 
Q=¢el a. ee 
Spe tt a’ AL 


Project b = (1,1,1,1) onto the first column. Then project b onto the plane of the 
first two columns. 


If u is a unit vector, then Q = J — 2uu- is areflection matrix (Example 3). Find Q1 
from u = (0,1) and Qz from u = (0, 2/2, /2/2). Draw the reflections when Q1 
and Qə multiply the vectors (1, 2) and (1,1, 1). 

Find all matrices that are both orthogonal and lower triangular. 


T 


Q = I —2uu? is a reflection matrix when uTu = 1. Two reflections give Q? = I. 


(a) Show that Qu = —u. The mirror is perpendicular to u. 


(b) Find Qv when u!v = 0. The mirror contains v. It reflects to itself. 


Challenge Problems 


(MATLAB) Factor [Q, R] = qr(A) for A = eye(4) — diag({1 1 1],—1). You 
are orthogonalizing the columns (1, —1, 0,0) and (0, 1, —1,0) and (0,0, 1, —1) and 
(0,0,0,1) of A. Can you scale the orthogonal columns of Q to get nice integer 
components? 


If A is m by n with rank n, qr(A) produces a square Q and zeros below R: 


The factors from MATLAB are (m by m)(m by n) A=([Qi Qə] HW ; 


The n columns of Qı are an orthonormal basis for which fundamental subspace? 
The m—n columns of Q2 are an orthonormal basis for which fundamental subspace? 


We know that P = QQ? is the projection onto the column space of Q(m by n). 
Now add another column a to produce A = [Q a]. Gram-Schmidt replaces a by 
what vector q? Start with a, subtract , divide by to find q. 


i 
u 
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Determinants 


1 The determinant of A = É l is ad — bc. Singular matrix A = f a has det = 0. 


Row exchange PA Cet has det PA = bc — ad = — det A. 
reverses signs c d b 


3 The determinant of = y yA xb Pa | Det is linear in 


is x(ad — be) + y(Ad — Be). row 1 by itself. 


4 Elimination EA = 


| det EA =a (a m b) = product of pivots = det A. 
a 


a b 

0 d—=6 
a 

5 If A is n by n then 1, 2,3,4 remain true: det = 0 when A is singular, det reverses sign 


when rows are exchanged, det is linear in row 1 by itself, det = product of the pivots. 
Always det BA = (det B)(det A) and det AT = det A. This is an amazing number. 


5.1 The Properties of Determinants 


The determinant of a square matrix is a single number. That number contains an amazing 
amount of information about the matrix. It tells immediately whether the matrix is invert- 
ible. The determinant is zero when the matrix has no inverse. When A is invertible, the 
determinant of A~* is 1/(det A). If det A = 2 then det A7! = 4. In fact the determinant 
leads to a formula for every entry in A~!. 

This is one use for determinants—to find formulas for inverse matrices and pivots and 
solutions A~!b. For a large matrix we seldom use those formulas, because elimination is 
faster. For a 2 by 2 matrix with entries a, b, c, d, its determinant ad — bc shows how A`! 
changes as A changes. Notice the division by the determinant ! 


1 = 
A= Į | has inverse AT! = ——— | 2 (1) 
c d ad—bc|-c a 
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Multiply those matrices to get [. When the determinant is ad — bc = 0, we are asked to 
divide by zero and we can’t—then A has no inverse. (The rows are parallel when a/c = 
b/d. This gives ad = bc and det A = 0.) Dependent rows always lead to det A = 0. 

The determinant is also connected to the pivots. For a 2 by 2 matrix the pivots are a 
and d — (c/a)b. The product of the pivots is the determinant: 


Product of pivots a (a — =b) =ad—bc whichis detA. 


After a row exchange the pivots change to c and b — (a/c)d. Those new pivots multiply to 
give bc — ad. The row exchange to fe a reversed the sign of the determinant. 


Looking ahead The determinant of an n by n matrix can be found in three ways: 


1 Multiply the n pivots (times 1 or —1) This is the pivot formula. 
2 Add up n! terms (times 1 or —1) This is the “big” formula. 
3 Combine n smaller determinants (times 1 or —1) This is the cofactor formula. 


You see that plus or minus signs—the decisions between 1 and —1—play a big part in 
determinants. That comes from the following rule for n by n matrices: 


The determinant changes sign when two rows (or two columns) are exchanged. 


The identity matrix has determinant +1. Exchange two rows and det P = —1. Exchange 
two more rows and the new permutation has det P = +1. Half of all permutations are 
even (det P = 1) and half are odd (det P = —1). Starting from 7, half of the P’s involve 
an even number of exchanges and half require an odd number. In the 2 by 2 case, ad has a 
plus sign and bc has minus—coming from the row exchange: 


1 0 0 1 
det | fey and det | aiast 


The other essential rule is linearity—but a warning comes first. Linearity does not mean 
that det(A + B) = det A + det B. This is absolutely false. That kind of linearity is not 
even true when A = J and B = I. The false rule would say that det(J + J) = 1+1 = 2. 
The true rule is det 27 = 2”. Determinants are multiplied by 2” (not just by 2) when 
matrices are multiplied by 2. 

We don’t intend to define the determinant by its formulas. It is better to start with 
its properties—sign reversal and linearity. The properties are simple (Section 5.1). They 
prepare for the formulas (Section 5.2). Then come the applications, including these three: 


(1) Determinants give At and A~'b (this formula is called Cramer’s Rule). 
(2) When the edges of a box are the rows of A, the volume is | det A]. 


(3) For n special numbers A, called eigenvalues, the determinant of A — AI is zero. 
This is a truly important application and it fills Chapter 6. 
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The Properties of the Determinant 


Determinants have three basic properties (rules 1, 2, 3). By using those rules we can 
compute the determinant of any square matrix A. This number is written in two ways, 
det A and |A|. Notice: Brackets for the matrix, straight bars for its determinant. When 
A is a 2 by 2 matrix, the rules 1, 2, 3 lead to the answer we expect: 


The determinant of |: | ic : 
CAd e d 
From rules 1-3 we will reach rules 4—10. The last two are det(AB) = (det A) (det B) and 
det AT = det A. We will check all rules with the 2 by 2 formula, but do not forget: The 
rules apply to any n by n matrix A. 
Rule 1 (the easiest) matches det J = 1 with volume = 1 for a unit cube. 


[= ad -te 


1 The determinant of the n by n identity matrix is 1. 


1 
Lg ; a 
F i =l and 7 = 
1 
2 The determinant changes sign when two rows are exchanged (sign reversal): 
Check: ; Za (both sides equal bc — ad). 


Because of this rule, we can find det P for any permutation matrix. Just exchange rows 
of J until you reach P. Then det P = +1 for an even number of row exchanges and 
det P = —1 for an odd number. 

The third rule has to make the big jump to the determinants of all matrices. 


3 The determinant is a linear function of each row separately (all other rows stay fixed). 
If the first row is multiplied by t, the determinant is multiplied by t. If first rows are added, 
determinants are added. This rule only applies when the other rows do not change! Notice 
how c and d stay the same: 


multiply row 1 by any number ¢ ta tb ila b 
det is multiplied by t e dl end 

add row 1 of A to row 1 of A’: ata b+b| ja b a b 
then determinants add c d | Ic d c da 


In the first case, both sides are tad — tbc. Then t factors out. In the second case, both sides 
are ad + a'd — bc — b'c. These rules still apply when A is n by n, and one row changes. | 


4 8 8 wW 2 2 4 8 8 4 0 0 0 8 8 
A= 1 Tf=4)0 1 1| and 0 1 = }0 1 140 1 1 
0 0 1 0 oO 1 0 0 1 0- 0r 0 0 1 


By itself, rule 3 does not say what those determinants are (det A is 4). 
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Combining multiplication and addition, we get any linear combination in one row. 
Rule 2 for row exchanges can put that row into the first row and back again. 

This rule does not mean that det 27 = 2 det I. To obtain 27 we have to multiply both 
rows by 2, and the factor 2 comes out both times: 
| E. 


0 2 


=P= and 6 ee 


This is just like area and volume. Expand a rectangle by 2 and its area increases by 4. 
Expand an n-dimensional box by ¢ and its volume increases by t”. The connection is no 
accident—we will see how determinants equal volumes. 

Pay special attention to rules 1-3. They completely determine the number det A. We 
could stop here to find a formula for n by n determinants (a little complicated). We prefer 
to go gradually, because rules 4 — 10 make determinants much easier to work with. 


4 If tworows of A are equal, then det A = 0. 


a b 


Equal rows Check 2 by 2 : b 


|=0. 


Rule 4 follows from rule 2. (Remember we must use the rules and not the 2 by 2 formula.) 
Exchange the two equal rows. The determinant D is supposed to change sign. But also D 
has to stay the same, because the matrix is not changed. The only number with — D = D 
is D = 0—this must be the determinant. (Note: In Boolean algebra the reasoning fails, 
because —1 = 1. Then D is defined by rules 1, 3, 4.) 

A matrix with two equal rows has no inverse. Rule 4 makes det A = 0. But matrices 
can be singular and determinants can be zero without having equal rows! Rule 5 will be 
the key. We can do row operations (like elimination) without changing det A. 


5 Subtracting a multiple of one row from another row leaves det A unchanged. 


£ times row 1 
from row 2 


c— fa d—b 


a bl- 
ic d 


o 


Rule 3 (linearity) splits the left side into the right side plus another term —£ | a |. 


This extra term is zero by rule 4: equal rows. Therefore rule 5 is correct (not just 2 by 2). 


Conclusion The determinant is not changed by the usual elimination steps from A to U. 
Thus det A equals det U. If we can find determinants of triangular matrices U, we can 
find determinants of all matrices A. Every row exchange reverses the sign, so always 
det A = + det U. Rule 5 has narrowed the problem to triangular matrices. 


6 A matrix with a row of zeros has det A = 0. 


0 0 
c d 


a b 


Row of zeros 0 0 


= and 


|=0. 


For an easy proof, add some other row to the zero row. The determinant is not changed 
(rule 5). But the matrix now has two equal rows. So det A = 0 by rule 4. 
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7 If Ais triangular then det A = a11Q22 +++ ânn =product of diagonal entries. 


a b 


0 
0 d 


d = ad. 


Triangular 


| = od and also bE 


Suppose all diagonal entries are nonzero. Remove the off-diagonal entries by elimination! 
(If A is lower triangular, subtract multiples of each row from lower rows. If A is upper 
triangular, subtract from higher rows.) By rule 5 the determinant is not changed—and now 
the matrix is diagonal: 


44 0 
Diagonal matrix det : = (G47 (G99) *== (ann). 
0 Ann 


Factor a1, from the first row by rule 3. Then factor a22 from the second row. Eventually 
factor an, from the last row. The determinant is a), times avs times --- times ann times 
det J. Then rule 1 (used at last!) is det J = 1. 

What if a diagonal entry a;; is zero? Then the triangular A is singular. Elimination 
produces a zero row. By rule 5 the determinant is unchanged, and by rule 6 a zero row 
means det A = 0. We reach the great test for singular or invertible matrices. 


8 If Ais singular then det A = 0. If A is invertible then det A Æ 0. 


Singular l A is singular if and only if ad — bc =0. 


Proof Elimination goes from A to U. If A is singular then U has a zero row. The rules 
give det A = det U = 0. If A is invertible then U has the pivots along its diagonal. The 
product of nonzero pivots (using rule 7) gives a nonzero determinant: 


Multiply pivots det A = +det U = + (product of the pivots). (2) 


The pivots of a 2 by 2 matrix (if a 4 0) are a and d — (c/a)b: 


a b 


The determinant is Te 


)b |= ad~be 


This is the first formula for the determinant. MATLAB multiplies the pivots to find 
det A. The sign in + det U depends on whether the number of row exchanges is even or 
odd: +1 or —1 is the determinant of the permutation P that exchanges rows. 

With no row exchanges, P = J anddet A = det U = product of pivots. Anddet L = 1: 


If PA=LU then detP det A= detL detU and det A= +detU. (3) 
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9 The determinant of AB is det A times det B: |AB| = |A| |B|. 


a b 
ome | 


ap+b r aq+bs 
cp+dr cq+ds|'` 


Pp q 
TIS 


Product rule 


When the matrix B is AT}, this rule says that the determinant of A`! is 1/ det A: 
A times A7! AA~1=T so (det A)(det A~+) = det I = 1. 


This product rule is the most intricate so far. Even the 2 by 2 case needs some algebra: 
|A| |B| = (ad — b c)(p — qr) = (ap +b r)éq + ds) — (aq + bs) (cp + dr) = | AB]. 


For the n by n case, here is a snappy proof that |AB| = |A| |B|. When |B] is not zero, 
consider the ratio D(A) = |AB|/|B|. Check that this ratio D(A) has properties 1,2,3. 
Then D(A) has to be the determinant and we have |AB|/|B| = |A|. Good. 


Property 1 (Determinant of I) If A = I then the ratio D(A) becomes |B|/|B| = 1. 


Property 2 (Sign reversal) When two rows of A are exchanged, so are the same two 
rows of AB. Therefore |AB| changes sign and so does the ratio |AB|/|B]. 


Property 3 (Linearity) When row 1 of A is multiplied by t, so is row 1 of AB. This 
multiplies the determinant |AB| by t. So the ratio |AB|/|B| is multiplied by t. 


Add row 1 of A to row 1 of A’. Then row 1 of AB adds to row 1 of A’B. 
By rule 3, determinants add. After dividing by |B|, the ratios add—as desired. 


Conclusion This ratio | AB|/|B| has the same three properties that define |A|. Therefore 
it equals |A|. This proves the product rule |AB| = |A| |B|. The case |B| = 0 is separate 
and easy, because AB is singular when B is singular. Then |AB| = |A| |B] is 0 = 0. 


10 The transpose AT has the same determinant as A. 


a b a c : ! 
Transpose | r = j since both sides equal ad — bc. 
The equation |AT| = |A| becomes 0 = 0 when A is singular (we know that AT is also 


singular). Otherwise A has the usual factorization PA = LU. Transposing both sides 
gives A? PT = UTLT. The proof of |A| = |AT| comes by using rule 9 for products: 


Compare det P det A=detLdetU with det AT det PT = det UT det LT. 


First, det L = det LT = 1 (both have 1’s on the diagonal). Second, det U = det UT (those 
triangular matrices have the same diagonal). Third, det P = det PT (permutations have 
PTP = I, so|P™||P| = 1 by rule 9; thus |P| and | P™| both equal 1 or both equal —1). 
So L,U, P have the same determinants as LT, UT, PT and this leaves det A = det AT. 
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Important comment on columns Every rule for the rows can apply to the columns (just 
by transposing, since |A| = |AT|). The determinant changes sign when two columns are 
exchanged. A zero column or two equal columns will make the determinant zero. If a 
column is multiplied by t, so is the determinant. The determinant is a linear function of 
each column separately. 

It is time to stop. The list of properties is long enough. Next we find and use an explicit 
formula for the determinant. 


= REVIEW OF THE KEY IDEAS = 


1. The determinant is defined by det J = 1, sign reversal, and linearity in each row. 
2. After elimination det A is + (product of the pivots). 
3. The determinant is zero exactly when A is not invertible. 


4. Two remarkable properties are det AB = (det A)(det B) and det AT = det A. 


™ WORKED EXAMPLES =" 


5.1A Apply these operations to A and find the determinants of Mı , M2, M3, Ma: 
In Mı, multiplying each a;; by (—1)**? gives a checkerboard sign pattern. 
In Mg, rows 1, 2,3 of A are subtracted from rows 2,3, 1. 
In M3, rows 1, 2,3 of A are added to rows 2, 3, 1. 

How are the determinants of Mı, M2, M3 related to the determinant of A? 


ail —Q19Q a13 row 1 — row 3 row 1 + row 3 
—A91 Q22 —a23 row 2 — row 1 row 2+ row 1 
a31 —a32 433 row 3 — row 2 row 3 + row 2 


Solution The three determinants are det A, 0, and 2 det A. Here are reasons: 


1 ai 412 413| | 1 
Mı = =] Q21 Q22 423 =l so det Mı = (—1)(det A)(—1). 
1 a31 432 433 1 


Mz is singular because its rows add to the zero row. Its determinant is zero. 
Ms can be split into eight matrices by Rule 3 (linearity in each row separately): 


row 1 + row 3 row 1 row 3 row | row 3 
row2+rowl |= | row2|+] row2/}+! rowl |+---+)} rowl 
row 3 + row 3 row 3 row 3 row 3 row 2 


All but the first and last have repeated rows and zero determinant. The first is A and the 
last has two row exchanges. So det M3 = det A + det A. (Try A = I.) 
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5.1B Explain how to reach this determinant by row operations: 


1—@ il il 
det 1 l-a 1 = a?(3—a). (4) 
1 1 i? 


Solution Subtract row 3 from row 1 and then from row 2. This leaves 


—a 0 a 


Now add column 1 to column 3, and also column 2 to column 3. This leaves a lower 
triangular matrix with —a, —a, 3 — a on the diagonal: det = (—a)(—a)(3 — a). 

The determinant is zero if a = 0 or a = 3. For a = 0 we have the all-ones matrix— 
certainly singular. For a = 3, each row adds to zero—again singular. Those numbers 0 
and 3 are the eigenvalues of the all-ones matrix. This example is revealing and important, 
leading toward Chapter 6. 


Problem Set 5.1 


Questions 1-12 are about the rules for determinants. 


1 If a 4 by 4 matrix has det A = 4, find det(2A) and det(—A) and det(A?) and 


det(A~*). 

2 If a 3 by 3 matrix has det A = —1, find det(5A) and det(—A) and det(A*) and 
det(A~*). 

3 True or false, with a reason if true or a counterexample if false: 


(a) The determinant of J + Ais 1 + det A. 
(b) The determinant of ABC is |A| |B| |C]. 
(c) The determinant of 4A is 4| A]. 


(d) The determinant of AB — BA is zero. Try an example with A = : 4 | ! 


4 Which row exchanges show that these “reverse identity matrices” Jg and J4 have 
|J3| = —1 but |J4| = +1? 


01 0010 
det |0 1 0| =—1 but det = +1. 
100 O £ 0 0 
1 0 0 0 
5 For n = 5,6,7, count the row exchanges to permute the reverse identity J» to the 


identity matrix In. Propose a rule for every size n and predict whether Jo; has 
determinant +1 or —1. 
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Oo 


10 


11 


12 


Show how Rule 6 (determinant = 0 if a row is all zero) comes from Rule 3. 
Find the determinants of rotations and reflections: 


1—2cos?@ —2cos@sin@ 
2 cos@ sin 6 1 —2sin? 6| ° 


cos —sin@ 
sin é cos 6 


=| | aa o-i 


Prove that every orthogonal matrix (QTQ = T) has determinant 1 or —1. 


(a) Use the product rule |AB| = |A| |B| and the transpose rule |Q| = |QT]. 


(b) Use only the product rule. If | det Q| > 1 then det Q” = (det Q)” blows up. 
How do you know this can’t happen to Q”? 


Do these matrices have determinant 0, 1, 2, or 3? 


0 0 1 0 1 1 
ASS 0 0 B= |1 0 1 C= 
0 1 0 1 1 0 


Se Ee 
aS — pl 
HS eS S E 


If the entries in every row of A add to zero, solve Ax = Q to prove det A = 0. If 
those entries add to one, show that det(A — I) = 0. Does this mean det A = 1? 


Suppose that CD = —DC and find the flaw in this reasoning: Taking determinants 
gives |C||D| = —|D||C|. Therefore |C| = 0 or |D| = 0. One or both of the 
matrices must be singular. (That is not true.) 


The inverse of a 2 by 2 matrix seems to have determinant = 1: 


det A~! = det — [es e eee ee 


ad—bc|—c a| ad—be | 


What is wrong with this calculation? What is the correct det A~!? 


Questions 13-27 use the rules to compute specific determinants. 


13 


14 


Reduce A to U and find det A = product of the pivots: 
1 1 1 E- 2-3 
A=]|1 2 2 A= |2 2 3 
1 2 3 3 3 3 


By applying row operations to produce an upper triangular U, compute 


1 2 3 0 2-1 0 0 
2 6 6 1 -1 2-1 0 
det 1003 and det 0 -1 2> 1] 
02 07 0 0—1 2 
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15 


16 


17 


18 


19 


20 


21 


22 
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Use row operations to simplify and compute these determinants: 


101 201 301 1 t es 
det | 102 202 302 and detit 1 t 
103 203 303 E t 1 


Find the determinants of a rank one matrix and a skew-symmetric matrix : 


1 0 1 3 
A= |2}[1 -4 5] ad A=]j-1 0 4 
3 -3 —4 0 
A skew-symmetric matrix has AT = —A. Insert a,b, c for 1,3, 4 in Question 16 and 


show that |A| = 0. Write down a 4 by 4 example with | A] = 1. : 


Use row operations to show that the 3 by 3 “Vandermonde determinant” is 


l a a? 
det |1 b b| =(b—a)(c—a)(c—D). | 
Lue e 


Find the determinants of U and U~! and U?: 


1 4 6 EJ 

U =) 2 5 and U = k n ; i 

0 0 3 | 

Suppose you do two row operations at once, going from 
a b T a— Le b—Ld 

c d g=la dip" : 


Find the second determinant. Does it equal ad — bc? 


Row exchange: Add row 1 of A to row 2, then subtract row 2 from row 1. Then add 
row | to row 2 and multiply row 1 by —1 to reach B. Which rules show 


det B = ? 


b 
d 


a 
C 


c d 
f i equals — det A = — 


Those rules could replace Rule 2 in the definition of the determinant. 


From ad — bc, find the determinants of A and A~! and A — AI: 


_f2 1 fe Ty Se a 
asli ] and A sla l and a-M=] 1 aN | 


Which two numbers À lead to det(A — AJ) = 0? Write down the matrix A — AJ for 
each of those numbers A—it should not be invertible. : 
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23 


24 


25 
26 
27 


28 


29 


30 


31 


32 


33 
34 


From A = [$4] find A? and A~! and A — AJ and their determinants. Which two 
numbers A lead to det(A — AJ) = 0 ? 


Elimination reduces A to U. Then A = LU: 


3 3 4 1 0 0||3 3 4 
A=] 6 8 7|=|2 1 SO) Ole. 2 —1ļ| = LU. 
—3 5 -9 seh 4 1j||O 0 -1 


Find the determinants of L, U, A, U~!L7!, and U~!L7! A. 
If the 7, j entry of A is i times j, show that det A = 0. (Exception when A = [1].) 
If the 2, j entry of A is i + j, show that det A = 0. (Exception when n = 1 or 2.) 


Compute the determinants of these matrices by row operations: 


0a 0 0] 
OF a 0 00 2 aaa 
A= 0» 0 b and B= and: = Soa. b b 
c 0 0 0 0 0 cœ b 
d 00 0 j i 


True or false (give a reason if true or a 2 by 2 example if false): 


(a) If A is not invertible then AB is not invertible. 

(b) The determinant of A is always the product of its pivots. 
(c) The determinant of A — B equals det A — det B. 

(d) AB and BA have the same determinant. 


What is wrong with this proof that projection matrices have det P = 1? 
1 
|AFI]A| 
(Calculus question) Show that the partial derivatives of In(det A) give A71! 


P= A(A"A) AT so |P|=|A| AST 


f(a, b,c, d) = In(ad — be) leads to ie a = A}, 


(MATLAB) The Hilbert matrix hilb(n) has 7, j entry equal to 1/(i + j — 1). Print 
the determinants of hilb(1), hilb(2),..., hilb(10). Hilbert matrices are hard to work 
with! What are the pivots of hilb (5)? 


(MATLAB) What is a typical determinant (experimentally) of rand (n) and randn (n) 
for n = 50, 100, 200, 400? (And what does “Inf” mean in MATLAB?) 


(MATLAB) Find the largest determinant of a 6 by 6 matrix of 1’s and —1’s. 
If you know that det A = 6, what is the determinant of B? 


row 1 row 3 + row 2+ row 1 
From det A = |row 2| = 6 find det B = row 2 + row 1 
row 3 row 1 
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5.2 Permutations and Cofactors 


1 2 by 2: ad — bchas 2! terms with + signs. n byn: det A adds n! terms with + signs. \ 


2 For n = 3, det A adds 3! = 6 terms. Two terms are +a12a23a3,; and —4a13422431. 
Rows 1, 2, 3 and columns 1, 2, 3 appear once in each term. 


3 That minus sign came because the column order 3, 2, 1 needs one exchange to recover 1, 2, 3. 


4 The six terms include +a) 422033 —@11@23032 = Q11 (A422433 — A423432) = 411 (cofactor C11). 


5 Always det A = a11C11 + a12Ci2 +--+: + GinCin. Cofactors are determinants of size n — 1. 


A computer finds the determinant from the pivots. This section explains two other ways 
to do it. There is a “big formula” using all n! permutations. There is a “cofactor formula” 
using determinants of size n — 1. The best example is my favorite 4 by 4 matrix: 


2 =1 0» 0 
=] 2-1 0 

A= 0-1 > 1] has det A= 5. 
0 -0 =l 2 


We can find this determinant in all three ways: pivots, big formula, cofactors. 


1. The product of the pivots is 2 - 2 . + . 3. Cancellation produces 5. 


2. The “big formula” in equation (8) has 4! = 24 terms. Only five terms are nonzero: 
det A= 16 — 4—4- 4+1=5. 


The 16 comes from 2 - 2 - 2 - 2 on the diagonal of A. Where do —4 and +1 come 
from? When you can find those five terms, you have understood formula (8). 


3. The numbers 2, —1, 0,0 in the first row multiply their cofactors 4, 3,2, 1 from the 
other rows. That gives 2-4 — 1-3 = 5. Those cofactors are 3 by 3 determinants. 
Cofactors use the rows and columns that are not used by the entry in the first row. 
Every term in a determinant uses each row and column once! 


The Pivot Formula 


When elimination leads to A = LU, the pivots dı, . . ., dn are on the diagonal of the 
upper triangular U. If no row exchanges are involved, multiply those pivots to find the 
determinant: 

det A = (det L)(det U) = (1)(did2---d,). (1) 


This formula for det A appeared in Section 5.1, with the further possibility of row 
exchanges. Then a permutation enters PA= L U. The determinant of P is —1 or +1. 
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(det P)(det A) = (det L)(detU) gives det A = +(dıd2-- dn). (2) 


Example 1 A row exchange produces pivots 4, 2, 1 and that important minus sign: 


0 0 1 4 5 6 
A=1]0 2 3 PA= |0 2 3 det A = —(4)(2)(1) = —8. 
4 5 6 0 0 1 
The odd number of row exchanges (namely one exchange) means that det P = —1. 


The next example has no row exchanges. It may be the first matrix we factored into 
LU (when it was 3 by 3). What is remarkable is that we can go directly to n by n. Pivots 
give the determinant. We will also see how determinants give the pivots. 


Example 2 The first pivots of this tridiagonal matrix A are 2, 3, $. The next are - and 


$ and eventually ril, Factoring this n by n matrix reveals its determinant: 


ao = 1 2 —1 
1 3 
-1 2-1 2 1 2 l 
“al | i 
=i] 5 n—1 1 m+1 
n Tm 


The pivots are on the diagonal of U (the last matrix). When 2 and - and å and 2 are 
multiplied, the fractions cancel. The determinant of the 4 by 4 matrix is 5. The 3 by 3 
determinant is 4. The n by n determinant is n + 1: 


—1,2,-1 matrix det A = (2) (3) (į) CH) =n +1. 


Important point: The first pivots depend only on the upper left corner of the original 
matrix A. This is a rule for all matrices without row exchanges. 


The first k pivots come from the k by k matrix A, in the top left corner of A. 
The determinant of that corner submatrix Aj, is d,d2 --- dx (first k pivots). 


The 1 by 1 matrix A, contains the very first pivot dı. This is det A,;. The 2 by 2 matrix in 
the corner has det Ap = dd. Eventually the n by n determinant multiplies all n pivots. 

Elimination deals with the matrix A; in the upper left corner while starting on the whole 
matrix. We assume no row exchanges—then A = LU and Ay, = L,U,. Dividing one 
determinant by the previous determinant (det A; divided by det A,—1) cancels everything 
but the latest pivot dx. Each pivot is a ratio of determinants: 


i dido--- det A 
Firo shes The kth pivot is dk = Onda dis = =a 
determinants 


-— 3 
dıdə 2. dk—ı det Ap_-1 ( ) 


We don’t need row exchanges when all the upper left submatrices have det Ay, £ 0. 
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The Big Formula for Determinants 


Pivots are good for computing. They concentrate a lot of information—enough to find the 
determinant. But it is hard to connect them to the original a,;;. That part will be clearer if 
we go back to rules 1-2-3, linearity and sign reversal and det J = 1. We want to derive a 
single explicit formula for the determinant, directly from the entries a;;. 

The formula has n! terms. Its size grows fast because n! = 1, 2,6, 24,120,.... For 
n = 11 there are about forty million terms. For n = 2, the two terms are ad and bc. Half 
the terms have minus signs (as in —bc). The other half have plus signs (as in ad). For n = 3 
there are 3! = (3)(2)(1) terms. Here are those six terms: 


Q11 412 413 
3 by 3 stelle s +41104224433 + 212423031 +@13421432 
: 1 22 2 
determinant à —4Q11423032 — Q12021433 — QA134022431. 
a31 Q32 433 


(4) 
Notice the pattern. Each product like a11a23a32 has one entry from each row. It also has 
one entry from each column. The column order 1, 3, 2 means that this particular term 
comes with a minus sign. The column order 3, 1, 2 in a13a21a32 has a plus sign (boldface). 
It will be “permutations” that tell us the sign. 

The next step (n = 4) brings 4! = 24 terms. There are 24 ways to choose one entry 
from each row and column. Down the main diagonal, @11@22@33@44 with column order 
1, 2,3, 4 always has a plus sign. That is the “identity permutation”. 

To derive the big formula I start with n = 2. The goal is to reach ad — bc in a systematic 
way. Break each row into two simpler rows: 


[a b|= [a 0] + [0 b] and [c d | = [e 0]+ [0 d]. 


Now apply linearity, first in row 1 (with row 2 fixed) and then in row 2 (with row 1 fixed): 


a b a 0 0 b 
j i SE | + i 4 (break up row 1) 
0 0 0 b 0 b = 
a a 
lhe a+ 0 A o+ lo A (break up row 2). 


The last line has 2? = 4 determinants. The first and fourth are zero because one row is a 
multiple of the other row. We are left with 2! = 2 determinants to compute: 


0 b 
c 0 


1 0 
0 1 


a 0 
0 d 


|= aa) [+e j| = a2 = bc 


ET 
The splitting led to permutation matrices. Their determinants give a plus or minus sign. 
The permutation tells the column sequence. In this case the column order is (1, 2) or (2, 1). 


Now try n = 3. Each row splits into 3 simpler rows like [a;; 0 0]. Using linearity in 
each row, det A splits into 33 = 27 simple determinants. If a column choice is repeated— 
for example if we also choose the row [a21 0 0J]—then the simple determinant is zero. 
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We pay attention only when the entries a;; come from different columns, like (3,1, 2): 


Q11 412 413 a11 a12 a13 
a21 Q223 @23|= a22 + a23 | + | a21 
ASi 432 433 433 a31 a32 
aii a12 413 
Six terms + a23 | + | a21 + a22 
a32 433 a31 


There are 3! = 6 ways to order the columns, so six determinants. The six permutations 
of (1, 2,3) include the identity permutation (1, 2, 3) from P = J. 


Column numbers = (1,2,3), (2,3, 1), (3, 1, 2), (1,3, 2), (2,1,3), (3,2,1). (6) 


The last three are odd permutations (one exchange). The first three are even permutations 
(0 or 2 exchanges). When the column sequence is (3, 1, 2), we have chosen the entries 
aı13a21432—that particular column sequence comes with a plus sign (2 exchanges). The 
determinant of A is now split into six simple terms. Factor out the a;;: 


1 1 | 1 
det A = a11422433 1 + €12423031 1) + a13421432 | 1 
1 1 1 
(7) 


+ a11023432 1| + a12021033 | 1 + 13022431 1 
1 1 1 


The first three (even) permutations have det P = +1, the last three (odd) permutations 
have det P = —1. We have proved the 3 by 3 formula in a systematic way. 

Now you can see the n by n formula. There are n! orderings of the columns. The 
columns (1,2,. .., n) go in each possible order (a, 8,. . .,w). Taking aia from row 1 
and a2g from row 2 and eventually anu from row n, the determinant contains the product 
21028 *** anw times +1 or —1. Half the column orderings have sign —1. 

The determinant of A is the sum of these n! simple determinants, times 1 or —1. 
The simple determinants a1a4đ26 '''anw Choose one entry from every row and column. 
For 5 by 5, the term @15@22033@4405; would have det P = —1 from exchanging 5 and 1. 


det A = sum over all n! column permutations P = (a, G,...,w) 


(8) 
= 5 (det P)aiaa2g+++any = BIG FORMULA. 


The 2 by 2 case is +a11@22 — a12021 (which is ad — bc). Here P is (1, 2) or (2, 1). 
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The 3 by 3 case has three products “down to the right” (see Problem 28) and three 
products “down to the left”. Warning: Many people believe they should follow this pattern 
in the 4 by 4 case. They only take 8 products—but we need 24. 


Example 3 (Determinant of U) When U is upper triangular, only one of the n! products 
can be nonzero. This one term comes from the diagonal: det U = +u11U22 -` Unn. All 
other column orderings pick at least one entry below the diagonal, where U has zeros. As 
soon as we pick a number like u21 = 0, that term in equation (8) is sure to be zero. 

Of course det J = 1. The only nonzero term is +(1)(1)--- (1) from the diagonal. 


Example 4 Suppose Z is the identity matrix except for column 3. Then 


The determinant of Z = iS C. (9) 


oe) 
Od 
0 0 
0 0 


The term (1)(1)(c)(1) comes from the main diagonal with a plus sign. There are 4! = 24 
products (choosing one factor from each row and column) but the other 23 products are 
zero. Reason: If we pick a, b, or d from column 3, that column is used up. Then the only 
available choice from row 3 is zero. 

Here is a different reason for the same answer. If c = 0, then Z has a row of zeros and 
det Z = c = Ois correct. If cis not zero, use elimination. Subtract multiples of row 3 from 
the other rows, to knock out a, b, d. That leaves a diagonal matrix and det Z = c. 

This example will soon be used for “Cramer’s Rule”. If we move a, b, c, d into the first 
column of Z, the determinant is det Z = a. (Why?) Changing one column of I leaves Z 
with an easy determinant, coming from its main diagonal only. 


Example 5 Suppose A has 1’s just above and below the main diagonal. Here n = 4: 


0 


and P= have determinant 1. 


D> 

Il 
ooko 
omom 
oeoo 
oomo 
CoCr 
mooo 
orFoo 


1 
0 
1 


The only nonzero choice in the first row is column 2. The only nonzero choice in row 4 is 
column 3. Then rows 2 and 3 must choose columns 1 and 4. In other words det P = det A. 
The determinant of P is +1 (two exchanges to reach 2,1,4,3). Therefore det A = +1. 


Determinant by Cofactors 


Formula (8) is a direct definition of the determinant. It gives you everything at once— 
but you have to digest it. Somehow this sum of n! terms must satisfy rules 1-2-3 (then 
all the other properties 4-10 will follow). The easiest is det J = 1, already checked. 
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When you separate out the factor a, or a2 OF aia that comes from the first row, 
you see linearity. For 3 by 3, separate the usual 6 terms of the determinant into 3 pairs: 


det A= a11 (22433 — 423432) +@12 (23031 — 421033) + 1g (G21432—A22031). (10) 


Those three quantities in parentheses are called “cofactors”. They are 2 by 2 determi- 
nants, from rows 2 and 3. The first row contributes the factors a11, @12,a13. The lower 
rows contribute the cofactors C11, C12, C13. Certainly the determinant a11 C11 +a12Ci2+ 
a13C}13 depends linearly on a11, @12, @13—this is Rule 3. 


The cofactor of a1; is C11 = @22433 — @23@32. You can see it in this splitting: 


Qi1 Q12 Q13 ail a12 413 
a21 a22 a23| = a22 Q@23| + |a21 a23| + |@21 422 
Q31 432 433 a32 433 Q31 Q33 Q31 432 


We are still choosing one entry from each row and column. Since a, uses up row 1 and 
column 1, that leaves a 2 by 2 determinant as its cofactor. 


As always, we have to watch signs. The 2 by 2 determinant that goes with a12 looks 
like a21@33 — @23@3;. But in the cofactor C12, its sign is reversed. Then ai2C12 is the 
correct 3 by 3 determinant. The sign pattern for cofactors along the first row is plus-minus- 
plus-minus. You cross out row 1 and column j to get a submatrix Mı; of size n — 1. 
Multiply its determinant by the sign (—1)1*/ to get the cofactor: 


The cofactors along row 1 are C1; = (—1)'*7 det My;. 


The cofactor expansion is det A = @11C11 + Q12Ci2 +°+++a@inCin. (11) 


In the big formula (8), the terms that multiply a1; combine to give C1; = det Mıı. The 
sign is (—1)!*!, meaning plus. Equation (11) is another form of equation (8) and also 
equation (10), with factors from row 1 multiplying cofactors that use only the other rows. 


Note Whatever is possible for row 1 is possible for row 7. The entries a;; in that row also 
have cofactors C;;. Those are determinants of order n — 1, multiplied by (—1)**/. Since 
aij accounts for row 7 and column J, the submatrix M;; throws out row i and column j. 
The display shows a43 and M43 (with row 4 and column 3 removed). The sign (—1)4*8 
multiplies the determinant of M43 to give C43. The sign matrix shows the + pattern: 


+- +- 
+ — + 


_ |9 @ o ” — 
aet r 8] signs (-aity=|7 FF Ft 
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The determinant is the dot product of any row i of A with its cofactors using other rows: 
COFACTOR FORMULA det A = ay Cy + ai2Ci2 +--+ GinCn. (12) 


Each cofactor Ci; (order n — 1, without row į and column j ) includes its correct sign: 


Cofactor C;; = (—1)'*! det Mij. 


A determinant of order n is a combination of determinants of order n — 1. A recursive 
person would keep going. Each subdeterminant breaks into determinants of order n — 2. 
We could define all determinants via equation (12). This rule goes from order n to n — 1 
to n — 2 and eventually to order 1. Define the 1 by 1 determinant |a| to be the number a. 
Then the cofactor method is complete. 

We preferred to construct det A from its properties (linearity, sign reversal, det I = 1). 
The big formula (8) and the cofactor formulas (10)-(12) follow from those rules. 
One last formula comes from the rule that det A = det AT. We can expand in cofactors, 
down a column instead of across a row. Down column j the entries are aij to anj. The 
cofactors are C1; to C,;. The determinant is the dot product: 


Cofactors down column j det A = a1j;C1j + a2j;Caj +: +anjCnj. (13) 


Cofactors are useful when matrices have many zeros—as in the next examples. 


Example 6 The —1, 2, —1 matrix has only two nonzeros in its first row. So only two 
cofactors C11 and C12 are involved in the determinant. I will highlight Cj. : 


f E Dy 2 -1 -1 -1 
a a Ae 2 ye 2 -1j. (14) 
-1 2 -1 2 


—1 2 


You see 2 times C11 first on the right, from crossing out row 1 and column 1. This cofactor 
Ci, has exactly the same —1, 2, —1 pattern as the original A—but one size smaller. 

To compute the boldface C12, use cofactors down its first column. The only nonzero 
is at the top. That contributes another —1 (so we are back to minus). Its cofactor is the 
—1, 2, —1 determinant which is 2 by 2, two sizes smaller than the original A. 


Summary Each determinant D,, of order n comes from D,_, and D,,_2: 
D4 = 2D; — Do and generally D, = 20D, i De (15) 
Direct calculation gives Dz = 3 and Dz = 4. Equation (14) has D4 = 2(4) — 3 = 5. 


These determinants 3, 4, 5 fit the formula Dn = n + 1. Then D, equals 2n — (n — 1). 
That “special tridiagonal answer” also came from the product of pivots in Example 2. 
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Example 7 This is the same matrix, except the first entry (upper left) is now 1: 


1 -1 
-1 2-1 
+ -1 2-1 


SE 


All pivots of this matrix turn out to be 1. So its determinant is 1. How does that come 
from cofactors? Expanding on row 1, the cofactors all agree with Example 6. Just change 
G1 = 2 tò bj = L: 


det Bg = D3 — Do instead of det A4 = 2D; — Do. 


The determinant of B4 is 4 — 3 = 1. The determinant of every B, isn — (n — 1) = 1. 
If you also change the last 2 into 1, why is det = 0? 


= REVIEW OF THE KEY IDEAS = 


1. With no row exchanges, det A = (product of pivots). In the upper left corner of A, 
det Ax = (product of the first k pivots). 


2. Every term in the big formula (8) uses each row and column once. Half of the 
n! terms have plus signs (when det P = +1) and half have minus signs. 


3. The cofactor Cj; is (—1)**9 times the smaller determinant that omits row i and 
column j (because a;; uses that row and column). 


4. The determinant is the dot product of any row of A with its row of cofactors. 
When a row of A has a lot of zeros, we only need a few cofactors. 


= WORKED EXAMPLES = 


5.2A A Hessenberg matrix is a triangular matrix with one extra diagonal. Use cofactors 
of row 1 to show that the 4 by 4 determinant satisfies Fibonacci’s rule |H4| = |H3| + | H2]. 
The same rule will continue for all sizes, |Hn| = |Hn—1| + |Hn-2|. Which Fibonacci 
number is | H,,|? 


2 1 
2 1 
2i 121 
i= |! i am ee M4=!4 124 
1 1 1 2 
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Solution The cofactor C11 for H4 is the determinant |H3|. We also need C12 (in boldface): 


1 1 0 2 1 0 iL 0 0 
Ci2 =—|1 2 1|=—]|1l 2 1j4+]1 2 1 
1 1 2 i a2 i 1 2 


Rows 2 and 3 stayed the same and we used linearity in row 1. The two determinants on the 
right are —|H3| and +|H>2|. Then the 4 by 4 determinant is 


|H4| = 2C11 + 112 = 2|H3| = | H3| + |H2| = |H3| + ||. 


The actual numbers are |H2| = 3 and |H3| = 5 (and of course |H;| = 2). Since 
|H,,| = 2,3,5,8,... follows Fibonacci’s rule |Hn—1|+|Hn—al, it must be |Hn| = Frye. 


5.2B These questions use the + signs (even and odd P’s) in the big formula for det A: 
1. If A is the 10 by 10 all-ones matrix, how does the big formula give det A = 0? 
2. If you multiply all n! permutations together into a single P, is P odd or even? 


3. If you multiply each a;j by the fraction i/j, why is det A unchanged? 


Solution In Question 1, with all a;; = 1, all the products in the big formula (8) will 
be 1. Half of them come with a plus sign, and half with minus. So they cancel to leave 
det A = 0. (Of course the all-ones matrix is singular. I am assuming n > 1.) 

In Question 2, multiplying [4 9] [93] gives an odd permutation. Also for 3 by 3, the 
three odd permutations multiply (in any order) to give odd. But for n > 3 the product of 
all permutations will be even. There are n!/2 odd permutations and that is an even number 
as soon as n! includes the factor 4. 

In Question 3, each a;; is multiplied by 7/7. So each product a) 442g -> + anw in the big 
formula is multiplied by all the row numbers ¿ = 1, 2,..., n and divided by all the column 
numbers j = 1, 2,...,m. (The columns come in some permuted order!) Then each product 
is unchanged and det A stays the same. 

Another approach to Question 3: We are multiplying the matrix A by the diagonal 
matrix D = diag(1 : n) when row 7 is multiplied by 7. And we are postmultiplying by 
D~* when column j is divided by j. The determinant of DAD~! is the same as det A 
by the product rule. 


Problem Set 5.2 


Problems 1-10 use the big formula with n! terms: |A| = >) ta1q428 +++ anw. 
Every term uses each row and each column once. 


1 Compute the determinants of A, B, C from six terms. Are their rows independent? 
to2. 3 1 2 3 l-1- T 
A= |3 1 2 B=]|4 4 4 C= 1 0 
go 2 i 5 6 7 1 0 0 
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2 Compute the determinants of A, B, C, D. Are their columns independent? 
1 1 0 1 2 3 
lO Ay Bele oe c= 1% p= T 
0 1 1 789 


3 Show that det A = 0, regardless of the five nonzeros marked by «’s: 


L TLT T What are the cofactors of row 1? 
A=]0 0 zl. What is the rank of A? 
00r What are the 6 terms in det A? 


4 Find two ways to choose nonzeros from four different rows and columns: 
1 0 0 1 1 0 0 2 
O 1 1 1 0 3 4 5 
A= 1101 B= 5 403 (B has the same zeros as A). 
1 0 0 1 2 0 0 1 


Is det A equal to 1 + 1 or 1 — 1 or —1 — 1? What is det B? 


5 Place the smallest number of zeros in a 4 by 4 matrix that will guarantee det A = 0. 
Place as many zeros as possible while still allowing det A Æ 0. 


6 (a) If a11 = a22 = a33 = 0, how many of the six terms in det A will be zero? 


(b) If a11 = a22 = a33 = a44 = 0, how many of the 24 products a1;a24A31d4m 
are sure to be zero? 


7 How many 5 by 5 permutation matrices have det P = +1? Those are even permuta- 
tions. Find one that needs four exchanges to reach the identity matrix. 


8 If det A is not zero, at least one of the n! terms in formula (8) is not zero. Deduce 
from the big formula that some ordering of the rows of A leaves no zeros on the 
diagonal. (Don’t use P from elimination; that PA can have zeros on the diagonal.) 


9 Show that 4 is the largest determinant for a 3 by 3 matrix of 1’s and —1’s. 


10 How many permutations of (1,2,3,4) are even and what are they? Extra credit: 
What are all the possible 4 by 4 determinants of J + Peven? 


Problems 11-22 use cofactors C;; = (—1)*+) det Mij. Remove row i and column j. 


11 Find all cofactors and put them into cofactor matrices C, D. Find AC and det B. 


i P 
a=(2 g Pasti 5-6 
7 0 0 
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12 Find the cofactor matrix C and multiply A times CT. Compare ACT with A7!: 


2 =i. Ü , [3 2 1 
A= |= 2 -1 A™ =z |2 4 2 
0—1 2 1 2 3 


13 Then by n determinant Cn has 1’s above and below the main diagonal: 


0 1 Oe 1 20 
O 1 0 


ooro 
Domon 
orno 
oroo 


(a) What are these determinants C1, C2, C3, C4? 
(b) By cofactors find the relation between Cn and Cn—1 and Ch—2. Find Co. 


14 The matrices in Problem 13 have 1’s just above and below the main diagonal. Going 
down the matrix, which order of columns (if any) gives all 1’s? Explain why that 


permutation is even forn = 4,8,12,...and odd forn = 2,6,10,.... Then 


Cn = 0 (odd n) Ca = 1 48 aes) Cel G56 ,252), 


15 The tridiagonal 1, 1, 1 matrix of order n has determinant En : 


E 1 10 1110 
Heel, E= E3=|1 1 1 E, = l 
1 1 TE O 1 1 1 
0 0 1 1 
(a) By cofactors show that En = E,-1 — En-2. 
(b) Starting from E; = 1 and Eo = 0 find E3, E4, . . ., Eg. 
(c) By noticing how these numbers eventually repeat, find E100. 
16 ~—s*#F,, is the determinant of the 1, 1, —1 tridiagonal matrix of order n: 
1 -1 LSL D i T = 
Fo = =2 F=]|1 1 —-1i|ļ|=3 F= Æ 4. 
1 1 0 11 1 1 —i 
1 1 
Expand in cofactors to show that Fa = Fy_-1 + Fn-2. These determinants are 


Fibonacci numbers 1, 2,3,5,8,13,.... The sequence usually starts 1, 1,2,3 (with 
two 1’s) so our Fn is the usual Fh+1.- 
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17 


18 


19 


20 


21 


22 


The matrix B,, is the —1,2,—1 matrix An except that bıı = 1 instead of a1 = 2. 
Using cofactors of the last row of B4 show that |B4| = 2|B3| — |Bo| = 1. 


The recursion |B,,| = 2|Bn-1| — |Bn-2] is satisfied when every |B,,| = 1. This 
recursion is the same as for the A’s in Example 6. The difference is in the starting 
values 1,1, 1 for the determinants of sizes n = 1, 2,3. 


Go back to Bn in Problem 17. It is the same as An except for b11 = 1. So use 
linearity in the first row, where [1 —1 0] equals[2 —1 0] minus[1 0 0]: 


—1 0 2 —1 of | 1.0 0 
Ay =| =| 
|Bn| 7 An-1 = Ag At p An-1 
0 0 0 


Linearity gives |B,| = |An| — |An-i| = 


Explain why the 4 by 4 Vandermonde determinant contains z? but not z4 or 2°: 
l a a 
aa Se 
1 z z? r’ 
The determinant is zero t zr = ___, „and __. The cofactor of z? is 


V3 = (b—a)(c—a)(c— b). Then V3 = (b — a)(c — a)(c — b)(x — a) (x — b) (x — c). 


Find G2 and G3 and then by row operations G4. Can you predict Gn? 


ee FE © 
= =. OF 
= oO e.m me 
O =. = m 


3 1 


S, = |3| s=li : 


3 
| s=] 
0 


Change 3 to 2 in the upper left corner of the matrices in Problem 21. Why does 
that subtract S,,_; from the determinant Sn? Show that the determinants of the new 
matrices become the Fibonacci numbers 2, 5, 13 (always Fon4+1). 
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Problems 23-26 are about block matrices and block determinants. 


23 = With 2 by 2 blocks in 4 by 4 matrices, you cannot always use block determinants: 


A B 
0 D 


A B 
[=a w |é | #1- ieis 


(a) Why is the first statement true? Somehow B doesn’t enter. 
(b) Show by example that equality fails (as shown) when C enters. 
(c) Show by example that the answer det(AD — C B) is also wrong. 


24 With block multiplication, A = LU has A; = L,U; in the top left corner: 


eE e 3 

A= = 

xo K o 0 * 

(a) Suppose the first three pivots of A are 2,3, —1. What are the determinants of 
Lı, L2, L3 (with diagonal 1’s) and U1, U2, U3 and A1, A2, A3? 


(b) If Ay, A2, A3 have determinants 5, 6, 7 find the three pivots from equation (3). 


25 Block elimination subtracts CAT! times the first row [A B] from the second row 
[C D]. This leaves the Schur complement D — C AT! B in the corner: 


I OTA B]_fA B 
SOAK TNC Di |e De CAB 


Take determinants of these block matrices to prove correct rules if A71 exists: 
A B 


C A = |A||D—CA!B| = |AD-—CB| provided AC = CA. 


26 =If Ais mby nand B isn by m, block multiplication gives det M = det AB: 


M= 0 A| |AB A I 0 
= |-B I| |0 TT} |-B I| 
If A is a single row and B is a single column what is det M? If A is a column and 


B is a row what is det M? Do a 3 by 3 example of each. 


27 (A calculus question) Show that the derivative of det A with respect to aj, is the 
cofactor C11. The other entries are fixed—we are only changing a11. 
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28 


29 


30 


31 


32 


33 


A 3 by 3 determinant has three products “down to the right” and three “down to the 
left” with minus signs. Compute the six terms like (1)(5)(9) = 45 to find D. 


Explain without determinants 
why this particular matrix 
is Or 1s not invertible. 


- — = + + + 


For £4 in Problem 15, five of the 4! = 24 terms in the big formula (8) are nonzero. 
Find those five terms to show that #4 = —1. 


For the 4 by 4 tridiagonal second difference matrix (entries —1, 2, —1) find the five 
terms in the big formula that give det A = 16 —-4-—4-—-4+1. 


Find the determinant of this cyclic P by cofactors of row 1 and then the “big for- 
mula”. How many exchanges reorder 4, 1, 2, 3 into 1, 2, 3, 4? Is P| = lor-1? 


0 0 0 1 0 0 1 0 
11000 2 10001] JO Z 
E GH. 20820 Pale ee eel o 
0 0 1 0 01100 

Challenge Problems 


Cofactors of the 1, 3, 1 matrices in Problem 21 give a recursion Sn = 3S,-1—Sp_2. 
Amazingly that recursion produces every second Fibonacci number. Here is the chal- 
lenge. 


Show that Sn is the Fibonacci number Fən42 by proving Fon+2 = 3Fon — Fon-2. 
Keep using Fibonacci’s rule Fy = Fk—1 + Fk—o starting with k = 2n + 2. 


The symmetric Pascal matrices have determinant 1. If I subtract 1 from the n,n 
entry, why does the determinant become zero? (Use rule 3 or cofactors.) 


1 1 1 1 1 1 1 1 

1 2 3 4 1 2 3 4 f 
det 1 3 6 10|- 1 (known) det isg iils O (to explain). 

1 4 10 20 1 4 10 19 
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34 This problem shows in two ways that det A = 0 (the x’s are any numbers): 


D 

Il 
ooos8 8 
Soe a8 
COOR 8 
S RRRR 
S R RRR 


(a) How do you know that the rows are linearly dependent? 
(b) Explain why all 120 terms are zero in the big formula for det A. 
35 If|det(A)| > 1, prove that the powers A” cannot stay bounded. But if |det(A)| < 1, 


show that some entries of A” might still grow large. Eigenvalues will give the right 
test for stability, determinants tell us only one number. 
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5.3 Cramer’s Rule, Inverses, and Volumes 


1 A~* equals CT / det A. Then (A~!);; = cofactor Cj; divided by the determinant of A. 
2 Cramer’s Rule computes x = A~‘b from x; = det(A with column j changed to b) / det A. 
3 Area of parallelogram = |ad—bc| if the four corners are (0,0), (a, b), (c,d), and (a+c, b+d). 


4 Volume of box = |det A| if the rows of A (or the columns of A) give the sides of the box. 


i j k Notice v x u = — (u x v). 
5 The cross product w = u x vis det| u u2 u3 |. W1, W2, ws are cofactors of row 1. 
ko V1 U2 V3 Notice wTu = 0 and wTv = 0. 


This section solves Ax = b and also finds A~!—by algebra and not by elimination. 
In all formulas you will see a division by det A. Each entry in A~! and AT tb is a determi- 
nant divided by the determinant of A. Let me start with Cramer’s Rule. 


Cramer’s Rule solves Ax = b. A neat idea gives the first component zı. Replacing 
the first column of J by æ gives a matrix with determinant zı. When you multiply it by A, 
the first column becomes Ax which is b. The other columns of B; are copied from A: 


0 0 bı a2 413 
Key idea A T2 1-0] = bz a22 Q23| = Bı. (L) 
0 1 bs a32 433 


We multiplied a column at a time. Take determinants of the three matrices to find x, : 


= det Bı 


= B = ; 
Product rule (det A)(xı) = det Bı or Tı a 


(2) 


This is the first component of æ in Cramer’s Rule! Changing a column of A gave Bi. 
To find z2 and Bo, put the vectors x and b into the second columns of J and A: 


1 Ti 0 
Same idea a, a2 ag 0 z O} = |a b a| = Bə. (3) 
0 T3 1 


Take determinants to find (det A)(x2) = det Bo. This gives x2 = (det B2)/(det A). 


Example 1 Solving 321 + 4x2 = 2 and 52, + 6x2 = 4 needs three determinants: 
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Those determinants of A, Bı, B2 are —2 and —4 and 2. All ratios divide by det A = —2: 


—4 2 3 4 2 2 
i = =1 = —— > = —— >= — = 
Find z = A~*b T= 2 T3 z 1 Cheek | Ha H 


CRAMER’s RULE If det A is not zero, Ax = b is solved by determinants: 


_ det By _ det B2 a det B,, 


fet Tka | det A 


Ti 


(4) 


The matrix B; has the jth column of A replaced by the vector b. 


To solve an n by n system, Cramer’s Rule evaluates n + 1 determinants (of A and the 
n different B’s). When each one is the sum of n! terms—applying the “big formula” with 
all permutations—this makes a total of (n + 1)! terms. It would be crazy to solve equations 
that way. But we do finally have an explicit formula for the solution æ. 


Example 2 Cramer’s Rule is inefficient for numbers but it is well suited to letters. For 
n = 2, find the columns of A~! = [x y] by solving AAT! = T: 


met eE e 


are x and y C d T2 0 
Those share the same matrix A. We need |A| and four determinants for £1, £2, Y1, Y2: 


d 0 d 


a b 
c 0 


and |o A 


on 3 1 


a 


The last four determinants are d, —c, —b, and a. (They are the cofactors!) Here is A~?: 


d —c —b a il d —b 
eee Ae Sales = fap ae | 


I chose 2 by 2 so that the main points could come through clearly. The new idea is: 
A~" involves the cofactors. When the right side is a column of the identity matrix J, 
as in AAT! = J, the determinant of each B; in Cramer’s Rule is a cofactor of A. 


You can see those cofactors for n = 3. Solve Ax = (1,0, 0) to find column 1 of A7?: 


5 1 a a a 1 a a a 1 
Determinants of B’s i 12 %13 11 A ae a 12 r r 
a a a a 
= Cofactors of A 22 423 21 23 21 422 
O a32 a33 azı O a33 tis, ga Ü 


That first determinant |B; | is the cofactor C11 = a22a33 —a23a32. Then | B2] is the cofactor 
C2. Notice that the correct minus sign appears in —(a21433 — @234@31). This cofactor C2 
goes into column 1 of A~!. When we divide by det A, we have the inverse matrix ! 
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The i,j entry of A~' is the cofactor Cji (not C;;) divided by det A: 


C? 


Cji = 
~ det A’ 


FORMULA FOR A71 (A); = and A?! 


~ det A (6) 


The cofactors C;; go into the “cofactor matrix” C. The transpose of C leads to Aq’. 
To compute the i,j entry of A~+, cross out row j and column i of A. Multiply the 
determinant by (—1)** to get the cofactor C;;, and divide by det A. 

Check this rule for the 3,1 entry of A~!. For column 1 we solve Ax = (1,0,0). 
The third component x3 needs the third determinant in equation (5), divided by det A. 
That determinant is exactly the cofactor C13 = @21032—@22431. SO (A~")s1 = Cy3/ det A. 


Summary In solving AA~! = I, each column of I leads to a column of A~!. Every 
entry of AT! is a ratio: determinant of size n — 1 / determinant of size n. 


Direct proof of the formula A~1 = C™/ det A This means ACT = (det A)I: 


Qi. @12 G13} |Cir Cor Car det A 0 € 
a21 Q23 23 Cio C22 C32 | = 0 det A 0 : (7) 
a31 @32 a33} |Ci3 C23 C33 0 0 net A 


(Row 1 of A) times (column 1 of CT) yields the first det A on the right: 
a11C11 + a12C 12 + a13C 13 = det A This is exactly the cofactor rule! 


Similarly row 2 of A times column 2 of CT (notice the transpose) also yields det A. 
The entries a2; are multiplying cofactors C2; as they should, to give the determinant. 


How to explain the zeros off the main diagonal in equation (7)? The rows of A are 
multiplying cofactors from different rows. Why is the answer zero? 


Row 2 of A 


Row 1 of C 21 C11 + a22Ci2 + a23C13 = 0. (8) 


Answer: This is the cofactor rule for a new matrix, when the second row of A is copied into 
its first row. The new matrix A* has two equal rows, so det A* = 0 in equation (8). Notice 
that A* has the same cofactors C11, C12, C13 as A—because all rows agree after the first 
row. Thus the remarkable multiplication (7) is correct: 


ACT = (det A)I or 
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Example 3 The “sum matrix” A has determinant 1. Then A~! contains cofactors: 


CT 


has inverse A`! = T = 


Pr Fr © 
Her oo 


0 0 
0 1 
0 —1 
1 0 


= = m= e 


—1 
0 
0 


Cross out row 1 and column 1 of A to see the 3 by 3 cofactor C11 = 1. Now cross out row 
1 and column 2 for C2. The 3 by 3 submatrix is still triangular with determinant 1. But 
the cofactor C12 is —1 because of the sign (—1)!+?. This number —1 goes into the (2, 1) 
entry of A~'—don’t forget to transpose C. 

The inverse of a triangular matrix is triangular. Cofactors give a reason why. 


Example 4 If all cofactors are nonzero, is A sure to be invertible? No way. 


Area of a Triangle 


Everybody knows the area of a rectangle—base times height. The area of a triangle is half 
the base times the height. But here is a question that those formulas don’t answer. If we 
know the corners (x1, yı) and (£2, y2) and (x3, y3) of a triangle, what is the area? 
Using the corners to find the base and height is not a good way to compute area. 


Determinants are the best way to find area. The area of a triangle is half of a 3 by 3 
determinant. The square roots in the base and height cancel out in the good formula. If 
one corner is at the origin, say (73, y3) = (0, 0), the determinant is only 2 by 2. 


(x2, y2) (x2, y2) 


(x1, y1) 


(x3, y3) 


Figure 5.1: General triangle; special triangle from (0, 0); general from three specials. 


determinant l 


The triangle with corners (x1, y1) and (2, Y2) and (x3, y3) has area = 5 


1 1 
Area of triangle 5/22 ¥ 1| Area = 3 z Yl) when (z3,y3) = (0,0). 
2 
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When you set z3 = y3 = 0 in the 3 by 3 determinant, you get the 2 by 2 determinant. These 
formulas have no square roots—they are reasonable to memorize. The 3 by 3 determinant 
breaks into a sum of three 2 by 2’s (cofactors), just as the third triangle in Figure 5.1 breaks 
into three special triangles from (0, 0): 


zı y 1 +4 (2192 — £241) 
Area = 5 z2 Y2 AS +4 (£293 z5 T3Y2) (9) 
zz yz 1 +4(z£3y1 — £193). 


If (0, 0) is outside the triangle, two of the special areas can be negative—but the sum is still 
correct. The real problem is to explain the area of a triangle with corner (0, 0). 

Why is $|r1y2 — x2y1| the area of this triangle? We can remove the factor i for 
a parallelogram (twice as big, because the parallelogram contains two equal triangles). 
We now prove that the parallelogram area is the determinant z142 — £241. This area in 


Figure 5.2 is 11, and therefore the triangle has area H, 


(x2, y2) (1,3) Parallelogram 
Area = K | = 11 
11 
(0, 0) (x1, y1) (0, 0) (4,1) Triangle: Area = a 


Figure 5.2: A triangle is half of a parallelogram. Area is half of a determinant. 


Proof that a parallelogram starting from (0,0) has area = 2 by 2 determinant. 


There are many proofs but this one fits with the book. We show that the area has the same 
properties 1-2-3 as the determinant. Then area = determinant! Remember that those three 
rules defined the determinant and led to all its other properties. 


1 When A = I, the parallelogram becomes the unit square. Its area is det J = 1. 


2 When rows are exchanged, the determinant reverses sign. The absolute value (positive 
area) stays the same—it is the same parallelogram. 


3 If row 1 is multiplied by t, Figure 5.3a shows that the area is also multiplied by t. Sup- 
pose a new row (21, y}) is added to (x1, y1) (keeping row 2 fixed). Figure 5.3b shows 
that the solid parallelogram areas add to the dotted parallelogram area (because the two 
triangles completed by dotted lines are the same). 


That is an exotic proof, when we could use plane geometry. But the proof has a major 
attraction—it applies in n dimensions. The n edges going out from the origin are given by 
the rows of an n by n matrix. The box is completed by more edges, like the parallelogram. 
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Dotted area = Solid area = A+ A’ 


Full area = tA gee 


- 


= / 


(xı + 21,41 +41) 


(0,0) (0,0) 


Figure 5.3: Areas obey the rule of linearity in side 1 (keeping the side (x2, y2) constant). 


Figure 5.4 shows a three-dimensional box—whose edges are not at right angles. The 
volume equals the absolute value of det A. Our proof checks again that rules 1-3 for 
determinants are also obeyed by volumes. When an edge is stretched by a factor t, the 
volume is multiplied by t. When edge 1 is added to edge 1’, the volume is the sum of the 
two original volumes. This is Figure 5.3b lifted into three dimensions or n dimensions. I 
would draw the boxes but this paper is only two-dimensional. 


(a31, 432, 433) 


volume of box 
=|determinant| 


(411,412,413 (a21, 422, 423) 


4 


Figure 5.4: Three-dimensional box formed from the three rows of A. 


Xi 


The unit cube has volume = 1, which is det J. Row exchanges or edge exchanges leave 
the same box and the same absolute volume. The determinant changes sign, to indicate 
whether the edges are a right-handed triple (det A > 0) ora left-handed triple (det A< 0). 
The box volume follows the rules for determinants, so volume of det A = absolute value. 


Example 5 Suppose a rectangular box (90° angles) has side lengths r,s, and t. Its 
volume is r times s times t. The diagonal matrix A with entries r, s, and t produces those 
three sides. Then det A also equals the volume r st. 
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Example 6 In calculus, the box is infinitesimally small! To integrate over a circle, we 
might change x and y to r and 0. Those are polar coordinates: x = r cos 0 and y = r sin 0. 
The area of a “polar box” is a determinant J times dr dé: 


Ox /Or se 


Oy/Or Oy/d6 


This determinant is the r in the small area dA = r dr d0. The stretching factor J goes into 
double integrals just as dxz/du goes into an ordinary integral f dx = {(dx/du) du. For 
triple integrals the Jacobian matrix J with nine derivatives will be 3 by 3. 


Area r dr dô in calculus = ; 
sin 0 r cos 8 


cos —rsin@ | 


The Cross Product 


The cross product is an extra (and optional) application, special for three dimensions. Start 
with vectors u = (u1, u2,u3) and v = (v1, v2, U3). Unlike the dot product, which is a 
number, the cross product is a vector—also in three dimensions. It is written u x v and 
pronounced “u cross v.” The components of this cross product are 2 by 2 cofactors. We 
will explain the properties that make u x v useful in geometry and physics. 

This time we bite the bullet, and write down the formula before the properties. 


DEFINITION The cross product of u = (uj, u2, u3) and v = (v4, V2, v3) is a vector 


i j k 
uxv = |u} U2 U3} = (u2v3 — Ugv2)é + (uzv — u1v3)j + (u1v2 — uzv: )k. 
Ui Ug Us 


(10) 


This vector u X v is perpendicular to u and v. The cross product v x uis —(u x v). 


Comment The 3 by 3 determinant is the easiest way to remember u x v. It is not especially 
legal, because the first row contains vectors 2,7, k and the other rows contain numbers. 
In the determinant, the vector å = (1,0,0) multiplies uzv3 and —ugv2. The result is 
(u2v3 — ugv2, 0,0), which displays the first component of the cross product. 

Notice the cyclic pattern of the subscripts: 2 and 3 give component 1 of u x v, then 3 
and 1 give component 2, then 1 and 2 give component 3. This completes the definition of 
u x v. Now we list the properties of the cross product: 


Property 1 v x u reverses rows 2 and 3 in the determinant so it equals — (u x v). 


Property 2 The cross product u x v is perpendicular to u (and also to v). The direct proof 
is to watch terms cancel, producing a zero dot product: 


u- (u x v) = t (u203 — U3V2) + u2(U3v1 = u13) + U3(U1V2 = ugu) = '(). (11) 


The determinant for u-(u x v) has rows u, u and v (2 equal rows) so it is zero. 


280 Chapter 5. Determinants 
Property 3 The cross product of any vector with itself (two equal rows) is u x u = 0. 


When u and v are parallel, the cross product is zero. When u and v are perpendicular, the 
dot product is zero. One involves sin 0 and the other involves cos 0: 


lu x vl] = [lull lvl] [sino] and fu v] = [jul] [|e] | cos 4}. (12) 


Example7 u = (3, 2,0) and v = (1, 4,0) are in the zy plane, u x v goes up the z axis: 
i j k 
uxv=]|3 2 0|=10k. The cross productis u x v = (0,0, 10). 
1 4 0 


The length of u X v equals the area of the parallelogram with sides u and v. This will 
be important: In this example the area is 10. 


Example 8 The cross product of u = (1,1,1) and v = (1, 1, 2) is (1, —1, 0): 
i j 
alt al 
ie 


Iai ee 
lh fos 


= 74 


i 3) HT 


woke & 


1 l-al 1 


This vector (1, —1, 0) is perpendicular to (1,1, 1) and (1, 1, 2) as predicted. Area = /2. 


Example9 The cross product of 4 = (1,0,0) and j = (0, 1, 0) obeys the right hand rule. 
That cross product k = 2 x 7 goes up not down: 


ixj=k 


k Rule u x v points along 
OV=k your right thumb when the 
0 fingers curl from u to v. 


Thus 2 x j = k. The right hand rule also gives 7 x k = t and k x 2 = 7. Note the cyclic 
order. In the opposite order (anti-cyclic) the thumb is reversed and the cross product goes 
the other way: k x j = —ti and îi x k = —j andj x i = —k. You see the three plus signs 
and three minus signs from a 3 by 3 determinant. 

The definition of u x v can be based on vectors instead of their components: 


DEFINITION The cross product is a vector with length ||w|| ||v|| | sin 0|. Its direction 


is perpendicular to u and v. It points “up” or “down” by the right hand rule. 


This definition appeals to physicists, who hate to choose axes and coordinates. They see 
(u1, U2, U3) as the position of a mass and (Fy, Fy, Fz) as a force acting on it. If F is 
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parallel to u, then u x F = O—there is no turning. The cross product u x F is the turning 
force or torque. It points along the turning axis (perpendicular to u and F). Its length 
|| z|| || F|| sin measures the “moment” that produces turning. 


Triple Product = Determinant = Volume 


Since u x v is a vector, we can take its dot product with a third vector w. That produces 
the triple product (u X v)» w. Itis called a “scalar” triple product, because it is a number. 
In fact it is a determinant—it gives the volume of the u, v, w box: 


Wi, W2 WZ ui U2 U3 
Triple product (uxv)ew =/u, uz uz|=|v vz U3}. (13) 
Ui U2 V3 Wi, W2 W3 


We can put w in the top or bottom row. The two determinants are the same because 
row exchanges go from one to the other. Notice when this determinant is zero: 


(u xv)-w=0 exactly when the vectors u, v, w lie in the same plane. 
First reason u x v is perpendicular to that plane so its dot product with w is zero. 
Second reason Three vectors in a plane are dependent. The matrix is singular (det = 0). 


Third reason Zero volume when the u,v, w box is squashed onto a plane. 


It is remarkable that (u x v) - w equals the volume of the box with sides u, v, w. This 
3 by 3 determinant carries tremendous information. Like ad — bc for a 2 by 2 matrix, it 
separates invertible from singular. Chapter 6 will be looking for singular. 


= REVIEW OF THE KEY IDEAS = 


1. Cramer’s Rule solves Ag = b by ratios like zı = |B,|/|A| = |b a2 - - - an|/|Al. 
2. When C is the cofactor matrix for A, the inverse is A~! = CT / det A. 

3. The volume of a box is | det A|, when the box edges are the rows of A. 

4. Area and volume are needed to change variables in double and triple integrals. 


5. In R®, the cross product u x v is perpendicular to u and v. Notice i x j = k. 
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= WORKED EXAMPLES =" 


5.3 A If A is singular, the equation ACT = (det A)I becomes ACT = zero matrix. 
Then each column of C7 is in the nullspace of A. Those columns contain cofactors along 
rows of A. So the cofactors quickly find the nullspace for a 3 by 3 matrix of rank 2. My 
apologies that this comes so late! 

Solve Ax = 0 by x = cofactors along a row, for these singular matrices of rank 2: 


Cofactors 14 7 1 i 2 
give A=/2 3 9 A=ļ|i 1 1 
nullspace 2 2 8 L ied 


Solution The first matrix has these cofactors along its top row (note each minus sign): 


2 8 2 2 


3 9 
2 8 


mea ate? [a ao 
Then x = (6, 2, —2) solves Ax = 0. The cofactors along the second row are (—18, —6, 6) 
which is just —3x. This is also in the one-dimensional nullspace of A. 

The second matrix has zero cofactors along its first row. The nullvector æ = (0, 0, 0) is 
not interesting. The cofactors of row 2 give x = (1, —1,0) which solves Aæ = 0. 

Every n by n matrix of rank n — 1 has at least one nonzero cofactor by Problem 3.3.12. 


But for rank n — 2, all cofactors are zero and we only find « = 0. 


5.3 B Use Cramer’s Rule with ratios det B;/det A to solve Ax = b. Also find the 
inverse matrix A~! = CT/det A. For this b = (0,0,1) the solution æ is column 3 of 
A-1! Which cofactors are involved in computing that column x = (x,y,z)? 


2 © 2 £ 0 
Column 3 of A~? 1 4 2 y |=] 0 
5 9 0 ž 1 


Find the volumes of two boxes : edges are columns of A and edges are rows of A~?. 


Solution The determinants of the B; (with right side b placed in column 7) are 


0 6 2 9 2 i260 
IBij=|0 4 2|=4 |Bil=]1 0 2;/=-2 |Bl=/1 4 0ļ|=2 
19 0 5 1 0 5 9 1 


Those are cofactors C31, C32, C33 of row 3. Their dot product with row 3 is det A = 2: 
det A = a31C31 + a32C'32 + a33C33 = (5,9, 0) ° (4, —2, 2) = 2 


The three ratios det B; / det A give the three components of x = (2, —1, 1). This æ is the 
third column of A~! because b = (0,0, 1) is the third column of T. 
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The cofactors along the other rows of A, divided by det A, give the other columns of AS: 


cr —18 18 4 
-i — ts 10 -—10 =2 |. Multiplytocheck AA =I 
oY =i a3 


The box from the columns of A has volume = det A = 2. The box from the rows also has 
volume 2, since |AT| = |A|. The box from the rows of A~! has volume 1/|A| = 3. 


Kee he oe 
Problem Set 5.3 ( 02) Oar) D 


r 
\ 


Problems 1-5 are about Cramer’s Rule for x = A~‘b. | 


1 Solve these linear equations by Cramer’s Rule x; = det B; / det A: f MO; 
224 + T2 = 1 } a 
? i = 1 K CoN 
(a) He He z (b) Tı +2z2 + z3 =0 pi; Mo Aa 
! í £2 +223 = 0. | 


2 Use Cramer’s Rule to solve for y (only). Call the 3 by 3 determinant D: 


S a IAA \ 
E el ax+by+ cz =1 le À 
(a) (b) dx+ey+fz=0 ¢ GS 
cx + dy = 0 j “z wR 
gz + hy + iz=0. / 
af Ayah 
3 Cramer’s Rule breaks down when det A = 0. Example (a) has no solution while, _ neva 
(b) has infinitely many. What are the ratios x; = det B;/ det A in these two cases? = i 
2201 + 305 = 1 . 2zi +322 =l ; 
(a) Ae ooe (parallel lines) (b) ie pirme? (same line) 
4 Quick proof of Cramer’s rule. The determinant is a linear function of column 1. It is 


zero if two columns are equal. When b = Ax = 21a) + £202 + 4343 goes into the 
first column of A, the determinant of this matrix B4 is 


|b a2 as] = |z1aı + L202 + £3A3 Q2 as| = £1 \a4 ao a3] = £1 det A. 


(a) What formula for xı comes from left side = right side? 


(b) What steps lead to the middle equation? 


5 If the right side b is the first column of A, solve the 3 by 3 system Az = b. How 
does each determinant in Cramer’s Rule lead to this solution 7? 
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Problems 6-15 are about A~1 = CT/ det A. Remember to transpose C. 


6 Find A`! from the cofactor formula CT / det A. Use symmetry in part (b). 


1 2 0 2-1 0 
(a) A= 1/0 3 0 (b) A=I|-1 2 -i 
0 7 1 0 -1 2 
7 If all the cofactors are zero, how do you know that A has no inverse? If none of the 


cofactors are zero, is A sure to be invertible? 


8 Find the cofactors of A and multiply AC™ to find det A: 


114 6 -3 0 
AS 2 2 and C= ils . . and ACT = 
1 2 5 : : . 


If you change that 4 to 100, why is det A unchanged? 
9 Suppose det A = 1 and you know all the cofactors in C. How can you find A? 
10 From the formula ACT = (det A)I show that det C = (det A)"7}. 


11 Ifall entries of A are integers, and det A = 1 or —1, prove that all entries of A7! 
are integers. Give a 2 by 2 example with no zero entries. 


12 Ifall entries of A and A`! are integers, prove that det A = 1 or —1. Hint: What is 
det A times det A~!? 


13 Complete the calculation of A~! by cofactors that was started in Example 5. 


14 Lis lower triangular and S is symmetric. Assume they are invertible: 


To invert a 0 0 a b d 
triangular L Pee Nb c 0 S= |b c e 
symmetric S d e f d e f 


(a) Which three cofactors of L are zero? Then L~? is also lower triangular. 


(b) Which three pairs of cofactors of S are equal? Then S~! is also symmetric. 


(c) The cofactor matrix C of an orthogonal Q will be . Why? 
15 Forn = 5 the matrix C contains cofactors. Each 4 by 4 cofactor contains 
terms and each term needs multiplications. Compare with 53 = 125 


for the Gauss-Jordan computation of AT! in Section 2.4. 
Problems 16-26 are about area and volume by determinants. 


16 (a) Find the area of the parallelogram with edges v = (3,2) and w = (1, 4). 
(b) Find the area of the triangle with sides v, w, and v + w. Draw it. 


(c) Find the area of the triangle with sides v, w, and w — v. Draw it. 
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17 


18 


19 


20 


21 


22 


23 


24 


25 


26 


A box has edges from (0, 0,0) to (3, 1, 1) and (1, 3, 1) and (1, 1,3). Find its volume. 
Also find the area of each parallelogram face using ||w x v|]. 


(a) The corners of a triangle are (2,1) and (3, 4) and (0,5). What is the area? 
(b) Add a corner at (—1,0) to make a lopsided region (four sides). Find the area. 


The parallelogram with sides (2, 1) and (2,3) has the same area as the parallelogram 
with sides (2,2) and (1,3). Find those areas from 2 by 2 determinants and say why 
they must be equal. (I can’t see why from a picture. Please write to me if you do.) 


The Hadamard matrix H has orthogonal rows. The box is a hypercube! 


1 1 1 1 
i. tae E ee 
Whatis |H| = ae ee ee volume of a hypercube in R*? 
1 —1 1 -l 


If the columns of a 4 by 4 matrix have lengths Lı, L2, L3, L4, what is the largest 
possible value for the determinant (based on volume)? If all entries of the matrix are 
1 or —1, what are those lengths and the maximum determinant? 


Show by a picture how a rectangle with area x; y2 minus a rectangle with area roy 
produces the same area as our parallelogram. 


When the edge vectors a, b, c are perpendicular, the volume of the box is ||a|| times 
||b|| times ||e||. The matrix ATA is . Find det ATA and det A. 


The box with edges 2 and j and w = 22+ 37 + 4k has height . What is the 
volume? What is the matrix with this determinant? What is 2 x j and what is its dot 
product with w? 


An n-dimensional cube has how many corners? How many edges? How many 
(n — 1)-dimensional faces? The cube in R” whose edges are the rows of 27 has 
volume . A hypercube computer has parallel processors at the corners with 
connections along the edges. 


The triangle with corners (0,0), (1,0), (0,1) has area $. The pyramid in R? with 
four comers (0, 0,0), (1,0,0), (0, 1,0), (0,0, 1) has volume . What is the vol- 
ume of a pyramid in R4 with five corners at (0,0, 0, 0) and the rows of 7? 


Problems 27-30 are about areas dA and volumes dV in calculus. 


27 


Polar coordinates satisfy x = r cos 0 and y = r sin 0. Polar area is J dr dé: 


cos —rsind 
sinf rcos@ 


J= = ; 


Oy/Or Oy/d6 


Ox /Or n 


The two columns are orthogonal. Their lengths are . Thus J = 


= 
E 
; 
2 
- 
T 
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28 Spherical coordinates p,¢,@ satisfy x = psinġcos and y = psingsin#é and 
z = pcos ®. Find the 3 by 3 matrix of partial derivatives: 0x /Op, 0x/0¢, 0x /00 in 
row 1. Simplify its determinant to J = p° sin ¢. Then dV in spherical coordinates is 
p? sin ¢dp dddo, the volume of an infinitesimal “coordinate box”. 


29 ‘The matrix that connects r, 0 to x,y is in Problem 27. Invert that 2 by 2 matrix: 


Shige cos ? 4 
J > 2 


a0/dx O0/dy 


Or /Ox È 


It is surprising that Or / Ot = 00) OF (Calculus, Gilbert Strang, P 501). Multiplying 
Ox Or Ox OO 
a0 _ 4 
0 Ox 


the matrices J and JT! gives the chain rule $2 = 5-55 + p 
30 The triangle with corners (0, 0), (6,0), and 4) has area _. When you rotate 
it by 0 = 60° the area is . The determinant of the rotation matrix is 
1 
2 cos@ —siné le ss = 
sinf cos P? i 


Problems 31-38 are about the triple product (u x v) - w in three dimensions. 


31 A box has base area ||u x v||. Its perpendicular height is ||w|| cos 0. Base area times 
height = volume = ||u x v|| ||w]| cos @ which is (u x v) - w. Compute base area, 
height, and volume for u = (2, 4, 0), v = (—1, 3,0), w = (1, 2, 2). 


32 The volume of the same box is given more directly by a 3 by 3 determinant. Evaluate 
that determinant. 


33 Expand the 3 by 3 determinant in equation (13) in cofactors of its row u1, ua, U3. 
This expansion is the dot product of u with the vector 


34 Which of the triple products (u x w) » v and (w x u) -v and (v x w) -u are the 
same as (u x v). w? Which orders of the rows u, v, w give the correct determinant? 


35 Let P = (1,0,—1)and Q = (1,1,1) and R = (2,2,1). Choose S so that PQRS 
is a parallelogram and compute its area. Choose T, U, V so that OPQRSTUV isa 
tilted box and compute its volume. 


36 Suppose (x,y,z) and (1,1,0) and (1,2,1) lie on a plane through the origin. What 
determinant is zero? What equation does this give for the plane? 


37 Suppose (x,y, z) is a linear combination of (2,3, 1) and (1, 2,3). What determinant 
is zero? What equation does this give for the plane of all combinations? 


38 (a) Explain from volumes why det 2A = 2” det A for n by n matrices. 
(b) For what size matrix is the false statement det A + det A = det(A + A) true? 
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39 
40 


41 


42 


Challenge Problems 


If you know all 16 cofactors of a 4 by 4 invertible matrix A, how would you find A? 


Suppose A is a 5 by 5 matrix. Its entries in row 1 multiply determinants (cofactors) 
in rows 2-5 to give the determinant. Can you guess a “Jacobi formula” for det A 
using 2 by 2 determinants from rows 1-2 times 3 by 3 determinants from rows 3-5? 


Test your formula on the —1, 2, —1 tridiagonal matrix that has determinant = 6. 


The 2 by 2 matrix AB =(2 by 3)(3 by 2) has a “Cauchy-Binet formula” for det AB: 
det AB = sum of (2 by 2 determinants in A) (2 by 2 determinants in B) 


(a) Guess which 2 by 2 determinants to use from A and B. 
(b) Test your formula when the rows of A are 1,2,3 and 1, 4,7 with B = AT. 


The big formula has n! terms. But if an entry of A is zero, (n — 1)! terms disappear. 
If A has only three diagonals, how many terms are left? 


For n = 1,2,3,4 the tridiagonal determinant has 1,2,3,5 terms. Those are 
Fibonacci numbers in Section 6.2! Show why a tridiagonal 5 by 5 determinant has 
5 +3 = 8 nonzero terms (Fibonacci again). Use the cofactors of a11 and a12. 


Chapter 6 


Eigenvalues and Eigenvectors 


6.1 Introduction to Eigenvalues 


1 An eigenvector z lies along the same line as Ax: | Ax = Azx.| The eigenvalue is À. 


2 If Ax = àg then A?a = \?a and Ate = àle and (A+ cI)æ = (A+ c)a: the same x. 


3 If Ax = Ag then (A—AJ)a = 0 and AAJ is singular and| det(A—ATI) = 0.| n eigenvalues. 


4 Check \’s by det A = (Ai) (Az) --- (An) and diagonal sum a11 + a22 +--+ + ann = sum of A’s. 


5 Projections have \=1 and 0. Reflections have 1 and —1. Rotations have et? and e~**: complex! 


This chapter enters a new part of linear algebra. The first part was about Ax = b: 
balance and equilibrium and steady state. Now the second part is about change. Time 
enters the picture—continuous time in a differential equation du/dt = Au or time steps 
in a difference equation Uukķ+ı = Aug. Those equations are NOT solved by elimination. 


The key idea is to avoid all the complications presented by the matrix A. Suppose 
the solution vector u(t) stays in the direction of a fixed vector x. Then we only need to 
find the number (changing with time) that multiplies x. A number is easier than a vector. 
We want “eigenvectors” x that don’t change direction when you multiply by A. 


A good model comes from the powers A, A, A®,... of a matrix. Suppose you need 
the hundredth power A199. Its columns are very close to the eigenvector (.6, .4): 


A, A, a8 =| g e Ee A aa A 


ee .30 200 300.475 -4000 .4000 


A100 was found by using the eigenvalues of A, not by multiplying 100 matrices. Those 
eigenvalues (here they are A = 1 and 1/2) are a new way to see into the heart of a matrix. 
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To explain eigenvalues, we first explain eigenvectors. Almost all vectors change di- 
rection, when they are multiplied by A. Certain exceptional vectors x are in the same 
direction as Ax. Those are the “eigenvectors”. Multiply an eigenvector by A, and the 
vector Az is a number A times the original æ. 


The basic equation is Ax = Ax. The number å is an eigenvalue of A. 


The eigenvalue A tells whether the special vector æ is stretched or shrunk or reversed or left 
unchanged—when it is multiplied by A. We may find À = 2 or 4 or —1 or 1. The eigen- 
value À could be zero! Then Ax = Ox means that this eigenvector æ is in the nullspace. 

If A is the identity matrix, every vector has Ax = æ. All vectors are eigenvectors of J. 
All eigenvalues “lambda” are A = 1. This is unusual to say the least. Most 2 by 2 matrices 
have two eigenvector directions and two eigenvalues. We will show that det(A — AI) = 0. 

This section will explain how to compute the x’s and \’s. It can come early in the course 
because we only need the determinant of a 2 by 2 matrix. Let me use det(A — AJ) = 0 to 
find the eigenvalues for this first example, and then derive it properly in equation (3). 


Example 1 The matrix A has two eigenvalues À = 1 and à = 1/2. Look at det(A— AJ): 


_[s 3 Se ee ee 
an ] det | E Aa D(a 3) 


I factored the quadratic into A — 1 times A — 53 to see the two eigenvalues A = 1 and 
A= L, For those numbers, the matrix A — AI becomes singular (zero determinant). The 


eigenvectors 2, and 22 are in the nullspaces of A — J and A — iI : 
(A — I)x; = Qis Ax; = x, and the first eigenvector is (.6, .4). 


(A — $1)@2 = 0 is Ax = ig and the second eigenvector is (1, —1): 


zı = i and Aa, = - ] a =g; (Ax = zx means that A, = 1) 


1 soii 5 = 
T2 = i and Axo = be | il = E (this is lg 80 ys = 1) 


If xı is multiplied again by A, we still get xı. Every power of A will give Ax, = zı. 
Multiplying x2 by A gave iro, and if we multiply again we get (4)? times £2. 


When A is squared, the eigenvectors stay the same. The eigenvalues are squared. 


This pattern keeps going, because the eigenvectors stay in their own directions (Figure 6.1) 
and never get mixed. The eigenvectors of A!°° are the same x, and x2. The eigenvalues 
of At are 110° = 1 and (4)1°° = very small number. 

Other vectors do change direction. But all other vectors are combinations of the two 
eigenvectors. The first column of A is the combination x; + (.2)x9: 


Separate into eigenvectors 8] _ |.6 :2 
Then multiply by A [3 inane) ea P T E | w) 
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A*x, = (1)?x 
m=] 17 = 1 1 (1) 1 


A” == 2S 25 
2 = 2 = : 
A*x2 = (.5) x2 = i 


Ax? = À2x2 = E 


Ar=ix 
x2 = il A*x = }°x 


Figure 6.1: The eigenvectors keep their directions. Aa = \*a with \? = 1? and (.5)?. 


When we multiply separately for xı and (.2)a2, A multiplies x2 by its eigenvalue E: 


‘ 8 i 1 6 «ll i 
Multiply each x; by A; A H is a+ 9 (22 = k + E = H . 


Each eigenvector is multiplied by its eigenvalue, when we multiply by A. At every step 
zı is unchanged and x2 is multiplied by (5); so 99 steps give the small number le 


99 | -8 1\99 hig 
A A is really a + (.2)(5) m= |$] + small 


vector 


This is the first column of A!°°. The number we originally wrote as .6000 was not exact. 
We left out (.2)($)°° which wouldn’t show up for 30 decimal places. 


The eigenvector x, is a “steady state” that doesn’t change (because A; = 1). The 
eigenvector £ is a “decaying mode” that virtually disappears (because Ay = .5). The 
higher the power of A, the more closely its columns approach the steady state. 


This particular A is a Markov matrix. Its largest eigenvalue is A = 1. Its eigenvector 
xı = (.6,.4) is the steady state—which all columns of A? will approach. Section 10.3 
shows how Markov matrices appear when you search with Google. 


For projection matrices P, we can see when Pz is parallel to x. The eigenvectors 
for A = 1 and A = 0 fill the column space and nullspace. The column space doesn’t move 
(Px = a). The nullspace goes to zero (Pa = 0 x). 
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oO rO 


Example2 The projection matrix P = | 5 5 


has eigenvalues À = 1 and À = 0. 


Its eigenvectors are 2; = (1,1) and a2 = (1, —1). For those vectors, Px, = ax, (steady 
state) and Pa = O (nullspace). This example illustrates Markov matrices and singular 
matrices and (most important) symmetric matrices. All have special ’s and x’s: 


1. Markov matrix : Each column of P adds to 1, so A = 1 is an eigenvalue. 
2. P is singular, so À = 0 is an eigenvalue. 
3. P is symmetric, so its eigenvectors (1, 1) and (1, —1) are perpendicular. 


The only eigenvalues of a projection matrix are 0 and 1. The eigenvectors for A = 0 (which 
means Px = Ox) fill up the nullspace. The eigenvectors for A = 1 (which means Px = x) 
fill up the column space. The nullspace is projected to zero. The column space projects 
onto itself. The projection keeps the column space and destroys the nullspace: 


Project each part v= E — projects onto Pv = fol i H l 


Projections have A = 0 and 1. Permutations have all |[A| = 1. The next matrix R is a 
reflection and at the same time a permutation. R also has special eigenvalues. 


Example3 The reflection matrix R = |9 3] has eigenvalues 1 and —1. 


The eigenvector (1,1) is unchanged by R. The second eigenvector is (1, —1)—its signs 
are reversed by R. A matrix with no negative entries can still have a negative eigenvalue! 
The eigenvectors for R are the same as for P, because reflection = 2(projection) — I: 


01 oO 1 0 
T a 3] 9). a 


When a matrix is shifted by I, each X is shifted by 1. No change in eigenvectors. 


Xə Px =X, Rx = Xi 
Pts = 0x2 
eee : `o — 
Projection onto blue line Reflection across line Rx = —X2 


Figure 6.2: Projections P have eigenvalues 1 and 0. Reflections R have ÀA = 1 and —1. 
A typical x changes direction, but an eigenvector stays along the same line. 
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The Equation for the Eigenvalues 


For projection matrices we found A’s and z’s by geometry: Px = x and Px = O. 
For other matrices we use determinants and linear algebra. This is the key calculation 
in the chapter—almost every application starts by solving Ax = Az. 

First move Az to the left side. Write the equation Ax = Ax as (A — AI)x = O. 
The matrix A — XI times the eigenvector æ is the zero vector. The eigenvectors make up 
the nullspace of A — AI. When we know an eigenvalue À, we find an eigenvector by 
solving (A — AJ)x = 0. 


Eigenvalues first. If (A — AJ)a = 0 has a nonzero solution, A — AJ is not invertible. 
The determinant of A — XI must be zero. This is how to recognize an eigenvalue A: 


Eigenvalues The number À is an eigenvalue of A if and only if A — AJ is singular. 


Equation for the eigenvalues det(A — AI) = 0. (3) 


This “characteristic polynomial” det(A — AI) involves only A, not æ. When A is n by n, 
equation (3) has degree n. Then A has n eigenvalues (repeats possible!) Each A leads to x: 


For each eigenvalue solve (A — AJ)x = 0 or Ax = Az to find an eigenvector z. 


Example4 A= | > | is already singular (zero determinant). Find its \’s and x’s. 

When A is singular, A = 0 is one of the eigenvalues. The equation Ax = Oz has 
solutions. They are the eigenvectors for A = 0. But det(A — AJ) = 0 is the way to find all 
X’s and a’s. Always subtract AJ from A: 


Subtract A from the diagonal to find A — XI = k P A 4 i a . (4) 
Take the determinant “ad — bc” of this 2 by 2 matrix. From 1 — A times 4 — A, 
the “ad” part is A? — 5A + 4. The “bc” part, not containing A, is 2 times 2. 


1=A 2 


det | 9 Ay 


| =0-24-)- @@) = 97-90 65) 


Set this determinant A? — 5X to zero. One solution is à = 0 (as expected, since A is 
singular). Factoring into A times A — 5, the other root is A = 5: 


det(A — AI) = 7 -—5rA\=0 yields the eigenvalues A; =0 and 2=5. 
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Now find the eigenvectors. Solve (A — AI )æ = 0 separately for A; = 0 and Az = 5: 


Pe a E . Hate z 
(A —Ol)a = E Al G = H yields an eigenvector | = fy for A; = 0 


ees ee ee ae = 
(A —5l)a = | 9 | H = H yields an eigenvector 2 = fal for Ag = 5. 


The matrices A — OJ and A — 5J are singular (because 0 and 5 are eigenvalues). The 
eigenvectors (2, —1) and (1, 2) are in the nullspaces: (A — AJ)x = Ois Ax = Az. 

We need to emphasize: There is nothing exceptional about A = 0. Like every other 
number, zero might be an eigenvalue and it might not. If A is singular, the eigenvectors 
for A = 0 fill the nullspace: Ax = Ox = 0. If A is invertible, zero is not an eigenvalue. 
We shift A by a multiple of I to make it singular. 

In the example, the shifted matrix A — 5/ is singular and 5 is the other eigenvalue. 


Summary To solve the eigenvalue problem for an n by n matrix, follow these steps: 


1. Compute the determinant of A — AI. With À subtracted along the diagonal, this 
determinant starts with A” or —A”. It is a polynomial in À of degree n. 


2. Find the roots of this polynomial, by solving det(A — AI) = 0. The n roots are 
the n eigenvalues of A. They make A — AJ singular. 


3. For each eigenvalue A, solve (A — AI)x = 0 to find an eigenvector x. 


A note on the eigenvectors of 2 by 2 matrices. When A — AJ is singular, both rows are 
multiples of a vector (a,b). The eigenvector is any multiple of (b, —a). The example had 


A = 0 : rows of A — OJ in the direction (1, 2); eigenvector in the direction (2, —1) 


A = 5 : rows of A — 5J in the direction (—4, 2); eigenvector in the direction (2, 4). 


Previously we wrote that last eigenvector as (1,2). Both (1,2) and (2,4) are correct. 
There is a whole line of eigenvectors—any nonzero multiple of x is as good as æ. 
MATLAB’s eig(A) divides by the length, to make the eigenvector into a unit vector. 


We must add a warning. Some 2 by 2 matrices have only one line of eigenvectors. 
This can only happen when two eigenvalues are equal. (On the other hand A = J has equal 
eigenvalues and plenty of eigenvectors.) Without a full set of eigenvectors, we don’t have a 
basis. We can’t write every v as a combination of eigenvectors. In the language of the next 
section, we can’t diagonalize a matrix without n independent eigenvectors. 
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Determinant and Trace 


Bad news first: If you add a row of A to another row, or exchange rows, the eigenvalues 
usually change. Elimination does not preserve the X’s. The triangular U has its eigenvalues 
sitting along the diagonal—they are the pivots. But they are not the eigenvalues of A! 
Eigenvalues are changed when row 1 is added to row 2: 


1 3 


2 6 


1 3 . = 
Ze 5 has A = 0 and à = 1; a=] 


| has A = 0 and A = 7. 


Good news second: The product À; times 2 and the sum 1 + Az can be found quickly 
from the matrix. For this A, the product is 0 times 7. That agrees with the determinant 
(which is 0). The sum of eigenvalues is 0 + 7. That agrees with the sum down the main 
diagonal (the trace is 1 + 6). These quick checks always work: 


The product of the w eigenvalues equals the determinant. 
The sum of the n eigenvalues equals the sum of the n diagonal entries. 


The sum of the entries along the main diagonal is called the trace of A: 


At tAg+::+ + An = trace = a11 + G22 + +++ + Ann. (6) 


Those checks are very useful. They are proved in Problems 16-17 and again in the next 
section. They don’t remove the pain of computing \’s. But when the computation is wrong, 
they generally tell us so. To compute the correct ’s, go back to det(A — AI) = 0. 

The trace and determinant do tell everything when the matrix is 2 by 2. We never want 
to get those wrong! Here trace = 3 and det = 2, so the eigenvalues are A = 1 and 2: 


1 9 ea Toa 
fe) TOE I eal c 


And here is a question about the best matrices for finding eigenvalues : triangular. 


Why do the eigenvalues of a triangular matrix lie along its diagonal? 


Imaginary Eigenvalues 


One more bit of news (not too terrible). The eigenvalues might not be real numbers. 


Example 5 The 90° rotation Q = f al has no real eigenvectors. Its eigenvalues 
are \ı = i and Aoa = —1. Then A; + Àz = trace = O and A1 Ào = determinant = 1. 


After a rotation, no real vector Qx stays in the same direction as x (x = O is useless). 
There cannot be an eigenvector, unless we go to imaginary numbers. Which we do. 
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To see how i = \/—1 can help, look at Q? which is —J. If Q is rotation through 90°, 
then Q? is rotation through 180°. Its eigenvalues are —1 and —1. (Certainly — Iæ = —1z.) 
Squaring Q will square each À, so we must have à? = —1. The eigenvalues of the 90° 
rotation matrix Q are +i and —1, because i? = —1. 

Those \’s come as usual from det(Q — AI) = 0. This equation gives à? + 1 = 0. 
Its roots are 2 and —i. We meet the imaginary number ż also in the eigenvectors: 


Complex oe! 2 es es pe OES E ale 
eigenvectors 1 olli li eee i S 


Somehow these complex vectors xı = (1,7) and £2 = (i, 1) keep their direction as they are 
rotated. Don’t ask me how. This example makes the all-important point that real matrices 
can easily have complex eigenvalues and eigenvectors. The particular eigenvalues 7 and —2 
also illustrate two special properties of Q: 


1. Q is an orthogonal matrix so the absolute value of each A is |A| = 1. 
2. Q is a skew-symmetric matrix so each À is pure imaginary. 


A symmetric matrix (ST = S$) can be compared to a real number. A skew-symmetric 
matrix (AT = —A) can be compared to an imaginary number. An orthogonal matrix 
(QTQ = I) corresponds to a complex number with || = 1. For the eigenvalues of S 
and A and Q, those are more than analogies—they are facts to be proved in Section 6.4. 

The eigenvectors for all these special matrices are perpendicular. Somehow (i, 1) and 
(1,7) are perpendicular (Chapter 9 explains the dot product of complex vectors). 


Eigenvalues of AB and A+B 


The first guess about the eigenvalues of AB is not true. An eigenvalue À of A times an 
eigenvalue 8 of B usually does not give an eigenvalue of AB: 


False proof ABs = Apr = BAz = BAL. (8) 


It seems that 8 times À is an eigenvalue. When z is an eigenvector for A and B, this 
proof is correct. The mistake is to expect that A and B automatically share the same 
eigenvector x. Usually they don’t. Eigenvectors of A are not generally eigenvectors of B. 
A and B could have all zero eigenvalues while 1 is an eigenvalue of AB: 


0 1 0 0 1 0 01 
a= |o i and zai AE then aB=| 5 4 and A+B =|! AR 


For the same reason, the eigenvalues of A + B are generally not A + 8. Here \ + 8 = 0 
while A + B has eigenvalues 1 and —1. (At least they add to zero.) 
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The false proof suggests what is true. Suppose & really is an eigenvector for both A and 
B. Then we do have ABgæ = AGa and BAx = AGa. When all n eigenvectors are shared, 
we can multiply eigenvalues. The test AB = BA for shared eigenvectors is important in 
quantum mechanics—time out to mention this application of linear algebra: 


A and B share the same n independent eigenvectors if and only if AB = BA. 


Heisenberg’s uncertainty principle In quantum mechanics, the position matrix P and 
the momentum matrix Q do not commute. In fact QP — PQ = I (these are infinite 
matrices). To have Px = 0 at the same time as Qx = O would require x = Ix = 0. 
If we knew the position exactly, we could not also know the momentum exactly. 
Problem 36 derives Heisenberg’s uncertainty principle ||Pæ|| ||Qa|| > ||2|l?. 


= REVIEW OF THE KEY IDEAS = 


. Ax = Ag says that eigenvectors æ keep the same direction when multiplied by A. 
. Ax = Az also says that det(A — AJ) = 0. This determines n eigenvalues. 


. The eigenvalues of A? and AT! are \? and \~!, with the same eigenvectors. 


Aa WwW N m 


. The sum of the A’s equals the sum down the main diagonal of A (the trace). 
The product of the ’s equals the determinant of A. 


5. Projections P, reflections R, 90° rotations Q have special eigenvalues 1, 0, —1, 7, —2. 
Singular matrices have A = 0. Triangular matrices have A’s on their diagonal. 


6. Special properties of a matrix lead to special eigenvalues and eigenvectors. 
That is a major theme of this chapter (it is captured in a table at the very end). 


=m WORKED EXAMPLES = 


6.1A Find the eigenvalues and eigenvectors of A and A? and A~! and A + 4I: 


a WEE 
a | and A =| le 


Check the trace A; + Ao = 4 and the determinant A; À2 = 3. 


Solution The eigenvalues of A come from det(A — AJ) = 0: 


2—A -1l1 


—1 2-A 


Ae i | det(A — A7) =| 


[=X -ar43=0 


This factors into (A — 1) (A — 3) = 0 so the eigenvalues of A are A, = 1 and àz = 3. For 
the trace, the sum 2+ 2 agrees with 1+ 3. The determinant 3 agrees with the product Aj 2. 
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The eigenvectors come separately by solving (A — AJ)x = 0 whichis Ax = Az: 


L =L fe Oe 1 
A=k (A= Ij = £ | H = H gives the eigenvector x, = H 


—1 -1] |z O| 2. 1 
AS Say (A-—3IĪ)zr = E E H = A gives the eigenvector £2 = il 
A? and AT! and A + 4I keep the same eigenvectors as A. Their eigenvalues are à? and 
Aland À +A: 


1+4=0 


1 1 
2 ; 2 2 — =I = z 
A“ has eigenvalues 1“ = 1 and 34 =9 A ~ has l and 3 A + 4I has 344-7 


Notes for later sections: A has orthogonal eigenvectors (Section 6.4 on symmetric 
matrices). A can be diagonalized since à # A2 (Section 6.2). A is similar to any 2 by 2 
matrix with eigenvalues 1 and 3 (Section 6.2). A is a positive definite matrix (Section 6.5) 
since A = AT and the A’s are positive. 


6.1B How can you estimate the eigenvalues of any A? Gershgorin gave this answer. 


Every eigenvalue of A must be “near” at least one of the entries aj; on the main diagonal. 
For À to be “near a;;” means that |a;; — À| is no more than the sum R; of all other |a;;| 
in that row 2 of the matrix. Then R; = %jz;|a;;| is the radius of a circle centered at aj;. 


Every A is in the circle around one or more diagonal entries a;;: |a;; — A| < Ri. 


Here is the reasoning. If À is an eigenvalue, then A — AZ is not invertible. Then A — AI 
cannot be diagonally dominant (see Section 2.5). So at least one diagonal entry aji — A 
is not larger than the sum R; of all other entries |a;;| (we take absolute values!) in row i. 


Example 1. Every eigenvalue A of this A falls into one or both of the Gershgorin circles: 
The centers are a and d, the radii are Rı = |b| and Rə = |c]. 


A-| 4 b First circle: |A — al < |b| 
~ | @ @ Second circle: |A — d| < [el 


Those are circles in the complex plane, since À could certainly be complex. 


Example 2. All eigenvalues of this A lie in a circle of radius R = 3 around one or more 
of the diagonal entries d1, d2, d3: 


dı 1 2 JA—d;|<14+2=R, 
A= 2 dy I JA — d2| <2+1= Rə 
=] 2 d; [A — d| < 1+2 = R3 


You see that “near” means not more than 3 away from d; or dz or d3, for this example. 
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6.1C Find the eigenvalues and eigenvectors of this symmetric 3 by 3 matrix S: 


Symmetric matrix I =} 0 
Singular matrix S= |=] 2 —1 
Trace1+2+1=4 0 —1 1 


Solution Since all rows of S add to zero, the vector x = (1,1,1) gives Sx = 0. 
This is an eigenvector for \ = 0. To find Ag and A3 I will compute the 3 by 3 determinant: 


ny a eae (PVC Cees cee me eee 
dees AD Sl. Sark al P=dSse= yay =7 
(oS a aN EA 


Those three factors give A = 0, 1,3. Each eigenvalue corresponds to an eigenvector (or a 
line of eigenvectors) : 


1 1 1 
zı = |1| Sa,=027,; «w= 0| S£ = 1z z3 = |—2| Sa3= 323. 
1 —1 1 


I notice again that eigenvectors are perpendicular when S' is symmetric. We were lucky to 
find \ = 0,1, 3. For a larger matrix I would use eig( A), and never touch determinants. 
The full command [X ,E] = eig(A) will produce unit eigenvectors in the columns of X. 


Problem Set 6.1 


1 The example at the start of the chapter has powers of this matrix A: 
ee eee 2 _|.70 .45 Bo oG 
= E J A J aa É A 


Find the eigenvalues of these matrices. All powers have the same eigenvectors. 


(a) Show from A how a row exchange can produce different eigenvalues. 


(b) Why is a zero eigenvalue not changed by the steps of elimination? 


2 Find the eigenvalues and the eigenvectors of these two matrices: 
1 4 24 
A= : | and A+J= E A 

A+ I has the eigenvectors as A. Its eigenvalues are by 1. 

3 Compute the eigenvalues and eigenvectors of A and AT}. Check the trace ! 
10 2 —ı {-1/2 1 
a= i and A =| 1/2 Ae 
A`! has the eigenvectors as A. When A has eigenvalues A; and Ag, its inverse 


has eigenvalues 
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A 


10 


11 


Compute the eigenvalues and eigenvectors of A and A?: 


_ {-1 3 Boo) ak 8 
ee | md A? =|) k 


A? has the same as A. When A has eigenvalues À; and \2, A? has eigenvalues 
_—  . In this example, why is \? + A2 = 13? 


Find the eigenvalues of A and B (easy for triangular matrices) and A + B: 


3 7 and Hal 5| and A+B=|f a 


1 1 


A=| 0 3 1 4 


Eigenvalues of A + B (are equal to)(are not equal to) eigenvalues of A plus eigen- 
values of B. 


Find the eigenvalues of A and B and AB and BA: 


1 0 1 2 1 2 3 2 
i=l) i and Bel j and AB =| | and Ba=|' a 


(a) Are the eigenvalues of AB equal to eigenvalues of A times eigenvalues of B? 
(b) Are the eigenvalues of AB equal to the eigenvalues of BA? 
Elimination produces A = LU. The eigenvalues of U are on its diagonal; they 


are the . The eigenvalues of L are on its diagonal; they are all ____. The 
eigenvalues of A are not the same as 


(a) If you know that x is an eigenvector, the way to find À is to 


(b) If you know that À is an eigenvalue, the way to find z is to 
What do you do to the equation Ax = Az, in order to prove (a), (b), and (c)? 


(a) A? is an eigenvalue of Aĉ, as in Problem 4. 
(b) AT! is an eigenvalue of AT}, as in Problem 3. 
(c) à + 1 is an eigenvalue of A + J, as in Problem 2. 


Find the eigenvalues and eigenvectors for both of these Markov matrices A and A”. 
Explain from those answers why A!°° is close to A®: 


a=[a 3] am a= las ais] 


Here is a strange fact about 2 by 2 matrices with eigenvalues A; Æ A2: The columns 
of A — àı Z are multiples of the eigenvector x2. Any idea why this should be? 
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13 


14 


15 


16 


17 


18 
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Find three eigenvectors for this matrix P (projection matrices have A= 1 and 0): 
2 A Q 
Projection matrix P=|4 8 0 
0 0 1 


If two eigenvectors share the same À, so do all their linear combinations. Find an 
eigenvector of P with no zero components. 


From the unit vector u = (4, és 3, 2) construct the rank one projection matrix 


P = uu”. This matrix has P? = P because ulu = 1. 
(a) Pu = u comes from (wut) 
ae 


(b) If v is perpendicular to u show that Pv = 0. Then A = 0. 


u = u( ). Then u is an eigenvector with 


(c) Find three independent eigenvectors of P all with eigenvalue \ = 0. 


Solve det(Q — AI) = 0 by the quadratic formula to reach A = cos 0 + isin 0: 


ae be — sind 


sed cae) | rotates the xy plane by the angle 0. No real \’s. 


Find the eigenvectors of Q by solving (Q — \J)x = 0. Use i? = —1. 


Every permutation matrix leaves x = (1,1,...,1) unchanged. Then \ = 1. Find 
two more \’s (possibly complex) for these permutations, from det(P — AJ) = 0: 


0 1 0 0 0 1 
Pees 0. 0 1 and P=]0 1 0 
1 0 0 1 0 0 
The determinant of A equals the product \;A2--- An. Start with the polynomial 
det(A — AI) separated into its n factors (always possible). Then set \ = 0: 
det(A — AI) = (Ar — A)(A2 — à) e An Aà) so detA= _ 


Check this rule in Example 1 where the Markov matrix has A = 1 and Ł. 


The sum of the diagonal entries (the trace) equals the sum of the eigenvalues: 


amje i has det(A -— ATI) = \* — (a + d)à +ad -bc = 0. 


The quadratic formula gives the eigenvalues A = (a+d+,/ )/2andà= __—_—— 
Their sumis ___. If A has àı = 3 and àz = 4 then det(A — AI) = 


If A has Ay = 4 and Ag = 5 then det(A — AI) = (A — 4) (à — 5) = A? — 9A + 20. 
Find three matrices that have trace a + d = 9 and determinant 20 and À = 4, 5. 
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19 


20 


21 


22 


23 


24 


25 


26 


27 


A 3 by 3 matrix B is known to have eigenvalues 0,1, 2. This information is enough 
to find three of these (give the answers where possible) : 

(a) the rank of B 

(b) the determinant of BT B 

(c) the eigenvalues of BT B 

(d) the eigenvalues of (B? + I)7!. 


Choose the last rows of A and C to give eigenvalues 4, 7 and 1, 2, 3: 


01 0 1 0 
Companion matrices A= |: : C=]|0 0 1 
* * * 


The eigenvalues of A equal the eigenvalues of AT. This is because det(A — AT) 
equals det( AT — AI). That is true because . Show by an example that the 
eigenvectors of A and AT are not the same. 


Construct any 3 by 3 Markov matrix M: positive entries down each column add to 1. 
Show that M7(1,1,1) = (1,1,1). By Problem 21, \ = 1 is also an eigenvalue 
of M. Challenge: A 3 by 3 singular Markov matrix with trace $ has what \’s ? 


Find three 2 by 2 matrices that have A; = A2 = 0. The trace is zero and the 
determinant is zero. A might not be the zero matrix but check that A? = 0. 


This matrix is singular with rank one. Find three A’s and three eigenvectors: 


1 2 1 2 

A= t 2 ee, 2s 

1 2 1.2 
Suppose A and B have the same eigenvalues A1, . . ., An with the same independent 
eigenvectors £1,. . . £n. Then A = B. Reason: Any vector æ is a combination 


Cj a1 + +++ + Cn£n. What is Ax? What is Ba? 


The block B has eigenvalues 1,2 and C has eigenvalues 3,4 and D has eigenval- 
ues 5, 7. Find the eigenvalues of the 4 by 4 matrix A: 


0130 

Si rE 
0 D 006 1 
00 1 6 


Find the rank and the four eigenvalues of A and C: 


ie ae he. 1010 
ES ar 4 0101 
A=), 177] 84 C=]1 9 1 O 
es el g-i g-i 
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Subtract J from the previous A. Find the A’s and then the determinants of 


0 1 1 1 0 —1 -1 —l1 
1 0O 1 1 —1 0 -1 -l 
B=A-I= 1101 and C =I- A= a 4 0 -1 
1 1 1 0 —] —1 -l 0 


(Review) Find the eigenvalues of A, B, and C: 
Ly 2:3 0 0 1 2 2 2 
A=]|0 4 5 and B= |0 2 0 and C= |2 2 2 
0 0 6 3 0 0 2 2 2 


When a + b=c + d show that (1, 1) is an eigenvector and find both eigenvalues: 
a b 
eee 


If we exchange rows 1 and 2 and columns 1 and 2, the eigenvalues don’t change. 
Find eigenvectors of A and B for A = 11. Rank one gives Az = A3 = 0. 


ee? a 6 3 3 
A=1|3 6 3 and B=PAP"=]|2 1 1 
48 4 8 4 4 


Suppose A has eigenvalues 0, 3, 5 with independent eigenvectors u, v, w. 


(a) Give a basis for the nullspace and a basis for the column space. 
(b) Find a particular solution to Ax = v + w. Find all solutions. 


(c) Ax =u has no solution. If it did then would be in the column space. 


Challenge Problems 


Show that u is an eigenvector of the rank one 2 x 2 matrix A = wv". Find both 
eigenvalues of A. Check that A; + Àz agrees with the trace u v1 + uova. 


Find the eigenvalues of this permutation matrix P from det (P — AI) = 0. Which 
vectors are not changed by the permutation? They are eigenvectors for A = 1. Can 
you find three more eigenvectors? 


oor © 
or OO 
= O a a nD) 
OS CO. 
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35 There are six 3 by 3 permutation matrices P. What numbers can be the determinants 
of P? What numbers can be pivots? What numbers can be the trace of P? What 
four numbers can be eigenvalues of P, as in Problem 15? 


36 (Heisenberg’s Uncertainty Principle) AB — BA = I can happen for infinite ma- 
trices with A = AT and B = —B™. Then 


ala = x" ABe — ax! BAx < 2||Ax|| || Bal). 


Explain that last step by using the Schwarz inequality Jutv| < lul] ||v||. Then 


Heisenberg’s inequality says that ||Azx||/||z|| times ||Ba||/||a|| is at least 2. 


2 
It is impossible to get the position error and momentum error both very small. 


37 Finda 2 by 2 rotation matrix (other than T) with A® = I. Its eigenvalues must satisfy 
à = 1. They can be e?*/3 and e~27*/3, What are the trace and determinant? 


38 (a) Find the eigenvalues and eigenvectors of A. They depend on c: 
A Isc 
ae | c | 
(b) Show that A has just one line of eigenvectors when c = 1.6. 


(c) This is a Markov matrix when c=.8. Then A” will approach what matrix A°? 


Eigshow in MATLAB 


There is a MATLAB demo (just type eigshow), displaying the eigenvalue problem for a 2 
by 2 matrix. It starts with the unit vector x = (1,0). The mouse makes this vector move 
around the unit circle. At the same time the screen shows Az, in color and also moving. 
Possibly Az is ahead of x. Possibly Ax is behind x. Sometimes Az is parallel to x. 

At that parallel moment, Ax = Ax (at xı and zə in the second figure). 


y = (0,1) se 02 m” O OO O s 
~ 02> OF Ax, = xı 
/ 
= \ 
Q3, 0.7) ig = Mins | 
\ l 
\ LS _. / 
Ax = (0.8, 0.2) j 
q ras 
x = (1,0) `~ — co circle ofx’s 
These are not eigenvectors Ax lines up with x at eigenvectors 


The eigenvalue À is the length of Ax, when the unit eigenvector æ lines up. The built-in 
choices for A illustrate three possibilities: 0,1, or 2 real vectors where Ax crosses x. 
The axes of the ellipse are singular vectors in 7.4—and eigenvectors if AT = A. 
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6.2 Diagonalizing a Matrix 


2a a ae ee nn a a a 
1 The columns of AX = XA are Ax, = Ak£k. The eigenvalue matrix A is diagonal? 


2 n independent eigenvectors in X diagonalize A A= XAX—!landA = X-1AX 


3 The eigenvector matrix X also diagonalizes all powers A® : Ak = XAKX7—1 


4 Solve uz41 = Aux by up = Aug = XA*X-1uy = e1(A1)¥ay + +--+ en(An)*an 


5 No equal eigenvalues => X is invertible and A can be diagonalized. 


Equal eigenvalues = A might have too few independent eigenvectors. Then X~! fails. 


6 Every matrix C = B~1 AB has the same eigenvalues as A. These C’s are “similar” to A. 


When g is an eigenvector, multiplication by A is just multiplication by a number A: 
Ax = Xz. All the difficulties of matrices are swept away. Instead of an interconnected 
system, we can follow the eigenvectors separately. It is like having a diagonal matrix, 
with no off-diagonal interconnections. The 100th power of a diagonal matrix is easy. 


The point of this section is very direct. The matrix A turns into a diagonal matrix A 
when we use the eigenvectors properly. This is the matrix form of our key idea. We start 
right off with that one essential computation. The next page explains why AX = XA. 


Diagonalization Suppose the n by n matrix A has n linearly independent eigenvectors 


£1,..-,£n. Put them into the columns of an eigenvector matrix X. Then X~!AX is 
the eigenvalue matrix A: 
Ai 
Eigenvector matrix X = oe , 
Eigenvalue matrix A ITAL SA= k ` D 
An 


The matrix A is “diagonalized.’ We use capital lambda for the eigenvalue matrix, 
because the small \’s (the eigenvalues) are on its diagonal. 


Example 1 This A is triangular so its eigenvalues are on the diagonal: A = 1 and A = 6. 


1 —1 1 5 1 1 1 0 

0 1 0 6 0 1 0 6 
K! A X = A 
In other words A = XAX7—!. Then watch A? = XAX-!XAX7—!, So A? is XA? X-I. 


Il 


Eigenvectors |1| |1 
go into X 0| |1 


A? has the same eigenvectors in X and squared eigenvalues in A?. 
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Why is AX = XA? A multiplies its eigenvectors, which are the columns of X. The 
first column of AX is Ax,.Thatis Axı. Each column of X is multiplied by its eigenvalue : 


A times X AX =Ala1 «+ zn| = |x e Ann 


The trick is to split this matrix AX into X times A: 


Ai 
X times A M21 tt) AnLn| = |B © Trn Me = XA. 
An 


Keep those matrices in the right order! Then A, multiplies the first column a 1, as shown. 
The diagonalization is complete, and we can write AX = X A in two good ways: 


ores X 1AX — A Mie A> XAXI. (2) 


The matrix X has an inverse, because its columns (the eigenvectors of A) were assumed to 
be linearly independent. Without n independent eigenvectors, we can’t diagonalize. 

A and A have the same eigenvalues à1,..., An. The eigenvectors are different. The 
job of the original eigenvectors £1, ..., £n was to diagonalize A. Those eigenvectors in X 
produce A = XAX~?. You will soon see their simplicity and importance and meaning. 
The kth power will be A = X A*X—! which is easy to compute: 


AP = (XAX7—1)(XAX7—!)...(XAX7}) = XAK X. 


Powers of A 1 5)° fi ifi 1 -1)_[1 6*-1]_ 4s 
Example 1 0 6| |0 1 6l lo 1] |j0 oe |" 
With k = 1 we get A. With k = 0 we get A? = I (and à} = 1). With k = —1 we get A`}. 


You can see how A? = [1 35; 0 36] fits that formula when k = 2. 
Here are four small remarks before we use A again in Example 2. 


Remark 1 Suppose the eigenvalues A;,..., Àn are all different. Then it is automatic that 
the eigenvectors %1,...,Z,, are independent. The eigenvector matrix X will be invertible. 
Any matrix that has no repeated eigenvalues can be diagonalized. 


Remark 2 We can multiply eigenvectors by any nonzero constants. A(cx) = (cx) is 
still true. In Example 1, we can divide x = (1,1) by V2 to produce a unit vector. 
MATLAB and virtually all other codes produce eigenvectors of length ||a|| = 1. 
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Remark 3 The eigenvectors in X come in the same order as the eigenvalues in A. 
To reverse the order in A, put the eigenvector (1, 1) before (1,0) in X: 


ee SE es i a | 6 0 
New order 6, 1 F 4 f i F = le 1] = Anes 


To diagonalize A we must use an eigenvector matrix. From X~!AX = A we know 
that AX = XA. Suppose the first column of X is x. Then the first columns of AX and 
XA are Ax and àx. For those to be equal, x must be an eigenvector. 


Remark 4 (repeated warning for repeated eigenvalues) Some matrices have too few 
eigenvectors. Those matrices cannot be diagonalized. Here are two examples: 


: : ae E! ENOT 
Not diagonalizable A= h a and B= F £ 


Their eigenvalues happen to be 0 and 0. Nothing is special about A = 0, the problem is the 
repetition of A. All eigenvectors of the first matrix are multiples of (1, 1): 


Only one line Ax = 0x means ic =o and Pa ; 
of eigenvectors T (Nee a EE 


There is no second eigenvector, so this unusual matrix A cannot be diagonalized. 

Those matrices are the best examples to test any statement about eigenvectors. In many 
true-false questions, non-diagonalizable matrices lead to false. 

Remember that there is no connection between invertibility and diagonalizability: 


-= Invertibility is concerned with the eigenvalues (À = 0 or À # 0). 
— Diagonalizability is concerned with the eigenvectors (too few or enough for X). 


Each eigenvalue has at least one eigenvector! A — XJ is singular. If (A — AI )æ = 0 leads 
you to æ = 0, À is not an eigenvalue. Look for a mistake in solving det(A — AI) = 0. 


Eigenvectors for n different \’s are independent. Then we can diagonalize A. 


Independent x from different A Eigenvectors Œı,...,æ; that correspond to 
distinct (all different) eigenvalues are linearly independent. An n by n matrix 
that has n different eigenvalues (no repeated A’s) must be diagonalizable. 


Proof Suppose c1£ı +c2£2 = 0. Multiply by A to find cı À1£ı + c2A2%2q = 0. Multiply 
by àz to find c1 À2£1 + cC2A2£2 = 0. Now subtract one from the other: 


Subtraction leaves (Ai — à2)c1£ı = 0. Therefore c, = 0. 


Since the A’s are different and 2; + 0, we are forced to the conclusion that cı = 0. 
Similarly cə = 0. Only the combination with c; = c2 = 0 gives c1£1 + C2£2 = 0. So the 
eigenvectors zı and x2 must be independent. 
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This proof extends directly to 7 eigenvectors. Suppose that c1ı£ı +---+c;x; = 0. 
Multiply by A, multiply by \;, and subtract. This multiplies x; by A; — A; = 0, and x; is 
gone. Now multiply by A and by A;_, and subtract. This removes x;_;. Eventually only 
xı is left: 


We reach (Ai — A2) --- (À — àj)c1æı =0 which forces cı = 0. (3) 


Similarly every c; = 0. When the A’s are all different, the eigenvectors are independent. 
A full set of eigenvectors can go into the columns of the eigenvector matrix X. 


Example 2 Powers of A The Markov matrix A = [|:$:3] in the last section had 
A, = 1 and Ay = .5. Here is A = X AX~! with those eigenvalues in the diagonal A: 


C eo ali Ola al! a 
Markov example E el = Fe E F J $ E) = XAX ~ 


The eigenvectors (.6, .4) and (1, —1) are in the columns of X. They are also the eigenvec- 
tors of A?. Watch how A? has the same X, and the eigenvalue matrix of A? is A*: 


Same X for A? A? = RAK KAK = XXL (4) 


Just keep going, and you see why the high powers A* approach a “steady state”: 


er oe ee (cae t 4 
Powers of A A= AN A = p E k a E | 


As k gets larger, (.5)* gets smaller. In the limit it disappears completely. That limit is °°: 


S a E TY TE OW) a ae T6 
Limit k —> oo A ` E F J F na HE 


The limit has the eigenvector xı in both columns. We saw this A% on the very first page 
of Chapter 6. Now we see it coming from powers like A!°° = X Al X71}, 


Question When does AF — zero matrix? Answer All |A| <1. 


Similar Matrices: Same Eigenvalues 


Suppose the eigenvalue matrix A is fixed. As we change the eigenvector matrix X, we get 
a whole family of different matrices A = X AX ~!—all with the same eigenvalues in A. 
All those matrices A (with the same A) are called similar. 

This idea extends to matrices that can’t be diagonalized. Again we choose one constant 
matrix C (not necessarily A). And we look at the whole family of matrices A = BC B71}, 
allowing all invertible matrices B. Again those matrices A and C are called similar. 
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We are using C instead of A because C might not be diagonal. We are using B instead 
of X because the columns of B might not be eigenvectors. We only require that B is 
invertible—its columns can contain any basis for R”. The key fact about similar matrices 
stays true. Similar matrices A and C have the same eigenvalues. 


All the matrices A = BCB~? are “similar.” They all share the eigenvalues of C. 


Proof Suppose Cx = \x. Then BC B~! has the same eigenvalue À with the new eigen- 
vector Ba: 


Same X (BCB-!)(Bt) = BCaz = Brx = (Bz). (5) 


A fixed matrix C produces a family of similar matrices BC Bt, allowing all B. 
When C is the identity matrix, the “family” is very small. The only member is BIBT! = T. 
The identity matrix is the only diagonalizable matrix with all eigenvalues À = 1. 

The family is larger when \ = 1 and 1 with only one eigenvector (not diagonalizable). 
The simplest C is the Jordan form—to be developed in Section 8.3. All the similar 
A’s have two parameters r and s, not both zero: always determinant = 1 and trace = 2. 

2 
C= k i = Jordan form gives A = BCB! = p 1 : 1 : (6) 

For an important example I will take eigenvalues \ = 1 and 0 (not repeated!). Now the 
whole family is diagonalizable with the same eigenvalue matrix A. We get every 2 by 2 
matrix that has eigenvalues 1 and 0. The trace is 1 and the determinant is zero: 


All ea oe ee _| 6 4 aay 
similar a=| 4 d A=| 4 a o A=] a | orany A= Se 


The family contains all matrices with A? = A, including A = A when B = J. When 
A is symmetric these are also projection matrices. Eigenvalues 1 and 0 make life easy. 


Fibonacci Numbers 


We present a famous example, where eigenvalues tell how fast the Fibonacci numbers grow. 
Every new Fibonacci number is the sum of the two previous F’’s: 


The sequence 0,1,1,2,3,5,8,13,.... comesfrom = Frio = Fk41 + Fk. 


These numbers turn up in a fantastic variety of applications. Plants and trees grow in a 
spiral pattern, and a pear tree has 8 growths for every 3 turns. For a willow those numbers 
can be 13 and 5. The champion is a sunflower of Daniel O’Connell, which had 233 seeds 
in 144 loops. Those are the Fibonacci numbers F3 and F)2. Our problem is more basic. 
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Problem: Find the Fibonacci number Fioo The slow way is to apply the rule 
Fk+2 = Fk+1 + Fk one step at a time. By adding Fg = 8 to Fy = 13 we reach Fg = 21. 
Eventually we come to F199. Linear algebra gives a better way. 

The key is to begin with a matrix equation u41 = Aux. That is a one-step rule for 
vectors, while Fibonacci gave a two-step rule for scalars. We match those rules by putting 
two Fibonacci numbers into a vector. Then you will see the matrix A. 


Frio = Friit Fe 1 1 
Ís Uni = ug. (7) 


Lo ug = 1 0 


T 
. The rule -o 
Fr Pep kn 


Every step multiplies by A = [+ 4 |. After 100 steps we reach u100 = Al? uo: 


_ |i _ {il _ {2 _ {3 _ | Fio1 
Up = 0 j ui = 1l? U2 = 1l? U3 = 9|? eig U100 = Bae i 


This problem is just right for eigenvalues. Subtract A from the diagonal of A: 


ESEA 


a-a =] Te 


| kadeto ‘det(A =A1) =X" =A=], 


The equation A? — \ — 1 = 0 is solved by the quadratic formula (—b + yb? — 4ac) / 2a: 


zx 1.618 and Ay = 7 x —.618. 


Eigenvalues Àl 


_1+Vv5 1- V5 
cre 


These eigenvalues lead to eigenvectors zı = (Ai, 1) and a2 = (Ao,1). Step 2 finds the 
combination of those eigenvectors that gives uo = (1, 0): 


1 1 Ay A2 Li — T2 
= = s 8 
o rere a) Sm y= a 
Step 3 multiplies uo by A12 to find u100. The eigenvectors x; and x2 stay separate! 
They are multiplied by (A; )!°° and (Az): 


(A, )1 a, S (Az) ae 


aS (9) 


100 steps from uo U100 = 


We want Fioo = second component of t199. The second components of xı and x2 are 1. 
The difference between A; = (1 + V5) /2 and Az = (1 — V5)/2 is /5. And AZ™ = 0. 


100th Fibonacci number = —— — = nearest integer to —= 
ape ee ie) 
Every Fy is a whole number. The ratio Fioi/Fio00 must be very close to the 
limiting ratio (1 + v5 ) / 2. The Greeks called this number the “golden mean”. 
For some reason a rectangle with sides 1.618 and 1 looks especially graceful. 


si : (- ap a (10) 
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Matrix Powers A” 


Fibonacci’s example is a typical difference equation uķ+ı = Aux. Each step multiplies 
by A. The solution is uz = Aug. We want to make clear how diagonalizing the matrix 
gives a quick way to compute A” and find ux in three steps. 

The eigenvector matrix X produces A = X AX 1 This is a factorization of the matrix, 
like A = LU or A = QR. The new factorization is perfectly suited to computing powers, 
because every time X = multiplies X we get I: 


Powers of A Auo = (XAX7!)---(XAX—1)ug = XAPX—1ug 
I will split X A* X—!u into three steps that show how eigenvalues work: 


1. Write wo as a combination c1 £1 +-:: + Cnn of the eigenvectors. Then c = x tip: 

2. Multiply each eigenvector x; by (;)*. Now we have A*X~! up. 

3. Add up the pieces c;(\;)"a; to find the solution uz = A*up. This is XA*X—1uo. 
Solution for uz41 = Auk up = Auo = c1(A1)* ay +e Cn(An) En. (11) 

In matrix language A* equals (XAX~—!)* which is X times A* times X~!. In Step 1, 


the eigenvectors in X lead to the c’s in the combination up = c1£1 +--+ Cnn: 


C1 


Step 1 uo = |i >- Bp = ps This says that u 
Ën 
The coefficients in Step 1 are c = X~! ao. Then Step 2 multiplies by A*. The final result 
uk = J` ¢;(A;)*ax; in Step 3 is the product of X and A% and X~! uo: 
(A1)* C1 
A"uo = XA" Xu = XA*e= |e. ... Bn a, Pile. 3} 
(An) Cn 
This result is exactly uz = c1 (à1)F£1 +-+: + n(An)*an. It solves ups, = Aug. 


Example 3 Start from ug = (1,0). Compute A’ uo for this faster Fibonacci: 


1 2 2 il 
A= í l has Ay =2. and 2 = H , Ag=-—-l and w= il 
This matrix is like Fibonacci except the rule is changed to Fk+2 = Fk+1 + 2Fk. 


The new numbers start with 0, 1,1,3. They grow faster because of À = 2. 
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Find ug = Af ug in 3 steps Uo = C1£ı + C2£2 and Uk = c1(A1)* £1 + Ca(À2)F £2 


1 t p2 1 1 l 
Step 1 ugo = fo =3 H +5 lal SO Uma = 
Step 2 Multiply the two parts by (\1)* = 2% and (A2)* = (—1)* 
Step 3 Combine eigenvectors c,(A1)*a1 and c2(Az)*ag into ux: 
1 2 1 1 F; 
— Ak _ tok Li ayk — | Pk+1 
wate aifi- [Re 


The new number is Fy, = (2 — (—1)*)/3. After 0,1, 1,3 comes Fy = 15/3 = 5. 


Behind these numerical examples lies a fundamental idea: Follow the eigenvectors. 
In Section 6.3 this is the crucial link from linear algebra to differential equations (A* will 
become eò). Chapter 8 sees the same idea as “transforming to an eigenvector basis.” 
The best example of all is a Fourier series, built from the eigenvectors e*** of d/dz. 


Nondiagonalizable Matrices (Optional) 
Suppose A is an eigenvalue of A. We discover that fact in two ways: 
1. Eigenvectors (geometric) There are nonzero solutions to Ax = Ax. 
2. Eigenvalues (algebraic) The determinant of A — AI is zero. 


The number À may be a simple eigenvalue or a multiple eigenvalue, and we want to know 
its multiplicity. Most eigenvalues have multiplicity M = 1 (simple eigenvalues). Then 
there is a single line of eigenvectors, and det(A — AJ) does not have a double factor. 

For exceptional matrices, an eigenvalue can be repeated. Then there are two different 
ways to count its multiplicity. Always GM < AM for each å: 


1. (Geometric Multiplicity = GM) Count the independent eigenvectors for À. 
Then GM is the dimension of the nullspace of A — XI. 


2. (Algebraic Multiplicity = AM) AM counts the repetitions of A among the 
eigenvalues. Look at the n roots of det(A — AJ) = 0. 


If A has \ = 4, 4, 4, then that eigenvalue has AM = 3 and GM = 1, 2, or 3. 
The following matrix A is the standard example of trouble. Its eigenvalue A = 0 is 
repeated. It is a double eigenvalue (AM = 2) with only one eigenvector (GM = 1). 


AM=2 ,_[0 1 
GM=1 ~*~ |o 0 


| has det(a -a = 79 \|= 2 A = 0,0 but 


0 Al 1 eigenvector 
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There “should” be two eigenvectors, because \2 = 0 has a double root. The double 
factor A? makes AM = 2. But there is only one eigenvector x = (1,0) and GM = 1. 
This shortage of eigenvectors when GM is below AM means that A is not diagonalizable. 


These three matrices all have the same shortage of eigenvectors. Their repeated eigen- 
value is \ = 5. Traces are 10 and determinants are 25: 


Beak 6 —1 i 
A=[) J and azli A and TE AR 


Those all have det(A — AI) = (A — 5)?. The algebraic multiplicity is AM = 2. But 
each A — 5I has rank r = 1. The geometric multiplicity is GM = 1. There is only one 
line of eigenvectors for \ = 5, and these matrices are not diagonalizable. 


= REVIEW OF THE KEY IDEAS = 


1. If A has n independent eigenvectors æ, ..., £n, they go into the columns of X. 


A is diagonalized by X XTIAX =A and A=XAX71. 


2. The powers of A are AF = X A*X—!. The eigenvectors in X are unchanged. 
3. The eigenvalues of A* are (A,)*,...,(An)* in the matrix A*. 


4. The solution to uz. = Au, starting from uo is uz = A*ug = XA*X~1!u0: 
Uk = c1(A1) "ay feet Calg)’ an provided Un = C11 +++: + CnEn. 


That shows Steps 1, 2,3 (c’s from X~ tugo, A* from A*, and x’s from X) 


5. A is diagonalizable if every eigenvalue has enough eigenvectors (GM = AM). 


=m WORKED EXAMPLES = 


6.2 A The Lucas numbers are like the Fibonacci numbers except they start with 
Lı = 1 and Lo = 3. Using the same rule Dy42 = Lk+1 + Lpr, the next Lucas numbers 
are 4,7, 11,18. Show that the Lucas number L109 is A100 + A400, 
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Solution uk+ı = É a Jug is the same as for Fibonacci, because Lk+2 = Lk+1 + Lk 
is the same rule (with different starting values). The equation becomes a 2 by 2 system: 


Lye = Ley + Le 


a i 
Let ug = .  Therule is u = Up. 
i | - Lyi -o S 1 0 ' 


Lr 


The eigenvalues and eigenvectors of A = [4 x | still come from à? = A+ 1: 


Me ast at ges ee eve a 


Now solve c1£ı + cC2£2 = U1 = (3,1). The solution is cı = Az and cz = Az. Check: 


| Aes |] _ | teaceot A?) [3 1. 
Mm tae =| 17% ~ | traceofA | i “a 


uioo = A®?w, tells us the Lucas numbers (L101, L100). The second components of the 
eigenvectors xı and x2 are 1, so the second component of w199 is the answer we want: 


Lucas starts faster than Fibonacci, and ends up larger by a factor near v5. 


6.2B Find the inverse and the eigenvalues and the determinant of this matrix A: 


4 -1 -1 -i 
=l 4o =S 
—1 -1 4 =l 
=l =I =l 4 


A = 5 * eye(4) — ones(4) = 


Describe an eigenvector matrix X that gives XTIAX = A. 


Solution What are the eigenvalues of the all-ones matrix? Its rank is certainly 1, 
so three eigenvalues are A = 0,0,0. Its trace is 4, so the other eigenvalue is A = 4. 
Subtract this all-ones matrix from 5/ to get our matrix A: 


Subtract the eigenvalues 4, 0, 0,0 from 5, 5,5, 5. The eigenvalues of A are 1, 5,5, 5. 


The determinant of A is 125, the product of those four eigenvalues. The eigenvector for 
A = lis x = (1,1,1,1) or (c,c,c,c). The other eigenvectors are perpendicular to x 
(since A is symmetric). The iges! eigenvector matrix X is the symmetric orthogonal 
Hadamard matrix H The factor 4 5 produces unit column vectors. 


T ee 
(a i Kee aun | 
| a 1 =i -1 
tf =i = 


Orthonormal eigenvectors X = H = 
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The eigenvalues of A~! are 1, E, E, E, The eigenvectors are not changed so A~! = 


HA~!H7—1. The inverse matrix is surprisingly neat: 


A~! = = x (eye(4) + ones(4)) = : 


l 
5 


= ee N 
Pre Ne 
=. Nee 
Neee 


A is a rank-one change from 5I. So AT! is a rank-one change from 1/5. 

In a graph with 5 nodes, the determinant 125 counts the “spanning trees” (trees that 
touch all nodes). Trees have no loops (graphs and trees are in Section 10.1). 

With 6 nodes, the matrix 6 x eye(5) — ones(5) has the five eigenvalues 1, 6, 6, 6, 6. 


Problem Set 6.2 


Questions 1-7 are about the eigenvalue and eigenvector matrices A and X. 


1 (a) Factor these two matrices into A = XAX7!: 
1 2 1 1 
A= F l and A= p 3 ; 


6b) RA=jXAX theas = X jad A= N W 


2 If A has à = 2 with eigenvector x; = fa and Ao = 5 with @ = Pap 
use X AX~! to find A. No other matrix has the same ’s and x’s. 


3 Suppose A = XAX~1. What is the eigenvalue matrix for A + 2I? What is the 
eigenvector matrix? Check that d+2I=( )( X) 


4 True or false: If the columns of X (eigenvectors of A) are linearly independent, then 


(a) A is invertible (b) Ais diagonalizable 
(c) X is invertible (d) X is diagonalizable. 


5 If the eigenvectors of A are the columns of J, then A is a matrix. If the eigen- 
vector matrix X is triangular, then X —! is triangular. Prove that A is also triangular. 


6 Describe all matrices X that diagonalize this matrix A (find all eigenvectors): 
4 0 
A= i 4 : 
Then describe all matrices that diagonalize A~'. 


7 Write down the most general matrix that has eigenvectors [ pl and E | l 
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Questions 8-10 are about Fibonacci and Gibonacci numbers. 


8 Diagonalize the Fibonacci matrix by completing X~!: 


1 1f Jà Ag} fA 0 

1 0|] 1 1 0 Ag l 
Do the multiplication X A*X~! Fal to find its second component. This is the kth 
Fibonacci number fy, = (AF — AS) /(Ar — Ag). 


9 Suppose G42 is the average of the two previous numbers G4; and Gx: 
Gr+2 = 7Gr+i + 5Gk is Gro] _ A Gk+1 
Gk+1 = Gest Grit Gk |` 


(a) Find the eigenvalues and eigenvectors of A. 
(b) Find the limit as n —> 00 of the matrices A” = XA” XTE, 
(c) If Go = 0 and G; = 1 show that the Gibonacci numbers approach 2, 


10 Prove that every third Fibonacci number in 0,1,1,2,3,... is even. 

Questions 11-14 are about diagonalizability. 

11 True or false: If the eigenvalues of A are 2, 2,5 then the matrix is certainly 
(a) invertible (b) diagonalizable (c) not diagonalizable. 


12 Trueor false: If the only eigenvectors of A are multiples of (1, 4) then A has 


(a) noinverse (b) arepeatedeigenvalue (c) no diagonalization XAXT!. 


13 Complete these matrices so that det A = 25. Then check that A = 5 is repeated— 
the trace is 10 so the determinant of A — AJ is (A — 5)?. Find an eigenvector with 
Aa = 5a. These matrices will not be diagonalizable because there is no second line 
of eigenvectors. 


8 _|9 4 _ {10 5 
a=] 3 and a=] l and e | 


14 The matrix A = F >| is not diagonalizable because the rank of A — 3I is 
Change one entry to make A diagonalizable. Which entries could you change? 
Questions 15-19 are about powers of matrices. 


15 AF = XA*X7—! approaches the zero matrix as k — oo if and only if every À has 
absolute value less than . Which of these matrices has A* — 0? 
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16 


17 


18 


19 


20 


21 


22 


23 


24 


25 


Chapter 6. Eigenvalues and Eigenvectors 


(Recommended) Find A and X to diagonalize A; in Problem 15. What is the limit 
of A% as k — œo? What is the limit of X A*X~1? In the columns of this limiting 
matrix you see the 


Find A and X to diagonalize Ag in Problem 15. What is (A2)+° uo for these wo? 


3 3 6 
Up = H and ugo = E and ugo = H i 
Diagonalize A and compute X A* X—! to prove this formula for A*: 


[2 -1 p _Lf14+3* 1-3 
SE has A a 143l 


Diagonalize B and compute X A*.X —! to prove this formula for B*: 


al p [5E Sea 
Bei 1 has B a 4k ; 


Suppose A = XAX~!. Take determinants to prove det A = det A = A12 -+ An- 
This quick proof only works when A can be 


Show that trace XY = trace Y X, by adding the diagonal entries of XY and Y X: 


_ ja b lar 
a i and Gel ae 


Now choose Y to be AX~!. Then XAX~! has the same trace as AXTIX = A. 
This proves that the trace of A equals the trace of A = sum of the eigenvalues. 


AB — BA = I is impossible since the left side has trace = . But find an 
elimination matrix so that A = E and B = ET give 


| which has trace zero. 


If A = XAXT!, diagonalize the block matrix B = i a4 . Find its eigenvalue and 
eigenvector (block) matrices. 


Consider all 4 by 4 matrices A that are diagonalized by the same fixed eigenvector 
matrix X. Show that the A’s form a subspace (cA and A; + Ag have this same X). 
What is this subspace when X = I? What is its dimension? 


Suppose A? = A. On the left side A multiplies each column of A. Which of our 
four subspaces contains eigenvectors with A = 1? Which subspace contains 
eigenvectors with A = 0? From the dimensions of those subspaces, A has a full 
set of independent eigenvectors. So a matrix with A? = A can be diagonalized. 
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26 (Recommended) Suppose Ax = Ax. If à = 0 then =z is in the nullspace. If A 4 0 
then x is in the column space. Those spaces have dimensions (n — r) +r = n. So 
why doesn’t every square matrix have n linearly independent eigenvectors? 


27 The eigenvalues of A are 1 and 9, and the eigenvalues of B are —1 and 9: 


5 4 4 5 
A= |i and B= |; Ae l 


Find a matrix square root of A from R = XVAX7—!. Why is there no real matrix 
square root of B? 


28 If A and B have the same \’s with the same independent eigenvectors, their factor- 
izations into are the same. So A = B. 


29 Suppose the same X diagonalizes both A and B. They have the same eigenvectors 
in A = XA Xt and B = XA2X7—!. Prove that AB = BA. 


30 (a) If A= [28] then the determinant of A — AI is (A — a)(A — d). Check the 
“Cayley-Hamilton Theorem” that (A — al)(A — dI) = zero matrix. 


(b) Test the Cayley-Hamilton Theorem on Fibonacci’s A = & ae The theorem 
predicts that A? — A — I = 0, since the polynomial det (A — AJ) is A? —A—1. 


31 Substitute A = XAX7! into the product (A — A I) (A — A2I)--- (A — àn I) and 
explain why this produces the zero matrix. We are substituting the matrix A for the 
number À in the polynomial p(A) = det(A — AI). The Cayley-Hamilton Theorem 
says that this product is always p(A) = zero matrix, even if A is not diagonalizable. 


32 If A = [$9] and AB = BA, show that B = [28] is also a diagonal matrix. B 
has the same eigen__ as A but different eigen___—«. These diagonal matrices 
B form a two-dimensional subspace of matrix space. AB — BA = 0 gives four 
equations for the unknowns a, b, c, d—find the rank of the 4 by 4 matrix. 


33 The powers A* approach zero if all |A;| < 1 and they blow up if any |A;| > 1. 
Peter Lax gives these striking examples in his book Linear Algebra: 


Ey el -[$ i - 9 
|| A29241| > 10709 i024 — 7 C1024 __ _o || D124} < 10778 


Find the eigenvalues \ = e” of B and C to show B4 = I and C? = —I. 
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Challenge Problems 


34 The nth power of rotation through @ is rotation through nô: 
A” | cos —sinð | _ | cosn@ —sinné 


sin 0 cos 6 sin nô cosnô |` 


Prove that neat formula by diagonalizing A = X AX —!. The eigenvectors (columns 
of X) are (1,7) and (i, 1). You need to know Euler’s formula et? = cos 6 + isin 0. 


35 The transpose of A = XAX~1is AT = (X~!)TAX7. The eigenvectors in AT y = 
Ay are the columns of that matrix (X—')?. They are often called left eigenvectors of 
A, because y! A = Ay". How do you multiply matrices to find this formula for A? 


36 The inverse of A = eye(n) + ones(n) is A~* = eye(n) + C x ones(n). Multiply 
AAT! to find that number C (depending on n). 


37 Suppose A, and Ag are n by n invertible matrices. What matrix B shows that A42 A; = 
B(A,A2)B 1? Then A24; is similar to A; A2: same eigenvalues. 

38 When isa matrix A similar to its eigenvalue matrix A ? 
A and A always have the same eigenvalues. But similarity requires a matrix B with 
A = BAB7!. Then B is the matrix and A must have n independent _ 


39 (Pavel Grinfeld) Without writing down any calculations, can you find the eigenvalues 
of this matrix ? Can you find the 2017th power A201” ? 


110 55 —164 
A= 42 21 —62 
88 44 -131 


If A is m by n and B is n by m, then AB and B A have same nonzero eigenvalues. 


Proof. Start with this identity between square matrices (easily checked). The first and 
third matrices are inverses. The “size matrix” shows the shapes of all blocks. 


I —A AB 0 I A|) |O 0 mxm mxn 
0 I B 0 0 I| | B BA nxm nxn 
This equation D71 ED =F says F is similar to E—they have the same m+n eigenvalues. 


AB 0 


Bal 0 


| has the m eigenvalues of AB, plus n zeros 


0 0 
F= B BA | has the n eigenvalues of BA, plus m zeros 


So AB and BA have the same eigenvalues except for |n — m| zeros. Wow. 


1 1 


If A = |1 1] and B = AT then ATA=| 3 1 


| (notice \ = 2 and 0) and AAT =[ 2 J. 
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6.3 Systems of Differential Equations 


A 


d 
1 If Ax = àg then u(t) = eòtæ will solve — = Au. Each à and g give a solution e^% z. 


2 IfA == XAXT! then u(t) = e^tu(0) = Xe“ X-tu(0) = ce Fay + eee + Cn Or” Bn. 


3 Ais stable and u(t) — 0 and e^t — 0 when all eigenvalues of A have real part < 0. 
4 Matrix exponential e^t = I + At+---+ (At)"/n!+--. = Xe^ XT! if A is diagonalizable. 


Second order equation 


/ 
1 / = > 1 u — 0 1 u 
Fiera a u” +Bu'+Cu = 0 is equivalent to u | = | | | | ; 


—C —B | |u" 


Eigenvalues and eigenvectors and A = XAX~! are perfect for matrix powers AF. 
They are also perfect for differential equations du/dt = Au. This section is mostly linear 
algebra, but to read it you need one fact from calculus: The derivative of et is eùt. The 
whole point of the section is this: To convert constant-coefficient differential equations 
into linear algebra. 
du 
d 
du f du Ai 
eA produces u(t) = Ce = Au produces u(t) = Ce (1) 
At time t = 0 those solutions include e? = 1. So they both reduce to u(0) = C. This 
“initial value” tells us the right choice for C’. The solutions that start from the number 
u(0) at time t = O are u(t) = u(0)et and u(t) = u(0)e%. 

We just solved a 1 by 1 problem. Linear algebra moves to n by n. The unknown is 
a vector u (now boldface). It starts from the initial vector u(0), which is given. The n 
equations contain a square matrix A. We expect n exponents e% in u(t), from n A’s: 


d 
The ordinary equations = = u and Au are solved by exponentials: 


p uy (0) 
System a a= Ai starting from the vector u(0) = | --- | att=0. (2) 
n equations dt Un(0) 


These differential equations are linear. If u(t) and v(t) are solutions, so is Cu(t) + Dv(t). 
We will need n constants like C and D to match the n components of u(0). Our first job 
is to find n “pure exponential solutions” u = e**x by using Aw = Az. 

Notice that A is a constant matrix. In other linear equations, A changes as t changes. 
In nonlinear equations, A changes as u changes. We don’t have those difficulties, 
du/dt = Au is “linear with constant coefficients”. Those and only those are the dif- 
ferential equations that we will convert directly to linear algebra. Here is the key: 


Solve linear constant coefficient equations by exponentials eta, when Ax = Ax. 
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Solution of du/dt = Au 


Our pure exponential solution will be eò times a fixed vector x. You may guess that is 
an eigenvalue of A, and x is the eigenvector. Substitute u(t) = ete into the equation 
du/dt = Au to prove you are right. The factor eùt will cancel to leave àx = Aa: 


Choose u = e*'x du 


— \»t : aper 
ERTA a an Aex agrees with Au = Aer (3) 


All components of this special solution u = eta share the same eò. The solution 
grows when A > 0. It decays when A < 0. If À is a complex number, its real part decides 
growth or decay. The imaginary part w gives oscillation e** like a sine wave. 


O 1 


du 
Example 1 Solve = Au = | 1 0 


J starting from u(0) = | s: f 


This is a vector equation for u. It contains two scalar equations for the components y and z. 
They are “coupled together” because the matrix A is not diagonal: 


du d lal [0 iiig dy _ dz _ 
FT = Au a H = p J H means that ET =z and <= y. 


The idea of eigenvectors is to combine those equations in a way that gets back to 1 by 1 
problems. The combinations y + z and y — z will doit. Add and subtract equations: 


y+.) =z+y and “(y-2) = 2) 
The combination y + z grows like et, because it has À = 1. The combination y — z decays 
like e~*, because it has A = —1. Here is the point: We don’t have to juggle the original 
equations du/dt = Au, looking for these special combinations. The eigenvectors and 
eigenvalues of A will do it for us. 

This matrix A has eigenvalues 1 and —1. The eigenvectors x are (1,1) and (1, —1). 


The pure exponential solutions u; and wz take the form erty with A, = land Ag = —-1: 
u(t) = erita, = et H and u(t) = er2te, = et E d (4) 

Notice: These w’s satisfy Au; = u, and Auz = —Uuz, just like x; and x. The factors et 

and e~ change with time. Those factors give du; /dt = u; = Au, and dug/dt = —u2 = 


Au. We have two solutions to du/dt = Au. To find all other solutions, multiply those 
special solutions by any numbers C and D and add: 
Ce + De * 


— cet |t —t| 1) _ 
Complete solution u(t) = Ce i + De e ee (5) 
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With these two constants C and D, we can match any starting vector u(0) = ( 
Set t = 0 and e? = 1. Example 1 asked for the initial value to be u(0) = (4, 


u(0) decides C, D C H +D E = H yields C=3 and D=1. 


With C = 3 and D = 1 in the solution (5), the initial value problem is completely solved. 
The same three steps that solved ux;1 = Aup now solve du/dt = Au: 


1. Write w(0) as a combination c1£1 + --- + Cn£n of the eigenvectors of A. 


2. Multiply each eigenvector x; by its growth factor eż. 


3. The solution is the same combination of those pure solutions e**z: 


du 


Ge = Au u(t) prem ciete + ae + Cnerrt an. 


Not included: If two 4’s are equal, with only one eigenvector, another solution is needed. 
(It will be te**x.) Step 1 needs to diagonalize A = XAX7!: a basis of n eigenvectors. 


Example 2 Solve du/dt = Au knowing the eigenvalues A = 1, 2,3 of A: 


Typical example T oe | 9 
Equation for u = 0 2 1] w~ startingfrom wu(0)= |7 
Initial condition u(0) oO) 4 4 


The eigenvectors are zı = (1,0, 0) and £2 = (1,1,0) and z3 = (1,1, 1). 


Step 1 The vector u(0) = (9,7, 4) is 2a, + 3a2 + 4a3. Thus (c, C2, c3) = (2,3, 4). 


t 


Step 2 The factors eò give exponential solutions ex; and e” no and ertas 


Step 3. The combination that starts from u(0) is u(t) = 2e¢a, + 3e7*a + 4e* ars. 


The coefficients 2, 3, 4 came from solving the linear equation c} x, + c2%2 +c3%3 = u(0): 


Ci ee ee | 2 9 
ti @ w@3|)o|8)0 1 1 3| = |7 whichis Xc= u(0). (7) 
C3 Oo 4 4 


You now have the basic idea—how to solve du/dt = Au. The rest of this section goes 
further. We solve equations that contain second derivatives, because they arise so often in 
applications. We also decide whether u(t) approaches zero or blows up or just oscillates. 

At the end comes the matrix exponential e^t. The short formula ety (0) solves the 
equation du/dt = Au in the same way that AŽ uo solves the equation up,; = Aug. 
Example 3 will show how “difference equations” help to solve differential equations. 


322 Chapter 6. Eigenvalues and Eigenvectors 


All these steps use the \’s and the æ’s. This section solves the constant coefficient 
problems that turn into linear algebra. It clarifies these simplest but most important 


differential equations—whose solution is completely based on growth factors aa 


Second Order Equations 


The most important equation in mechanics is my” + by’ + ky = 0. The first term 
is the mass m times the acceleration a = y”. This term ma balances the force F (that is 
Newton’s Law). The force includes the damping —by’ and the elastic force —ky, propor- 
tional to distance moved. This is a second-order equation because it contains the second 
derivative y = d*y/dt?. It is still linear with constant coefficients m, b, k. 


In a differential equations course, the method of solution is to substitute y = ert, 


Each derivative of y brings down a factor A. We want y = e to solve the equation: 


d? d 
m—> tb +ky=0 becomes (MA? + bA + k) = 0. (8) 


Everything depends on mA? + bA + k = 0. This equation for \ has two roots \; and 
A2. Then the equation for y has two pure solutions y; = eit and Y2 = eò2t. Their 
combinations c,y; + c2y2 give the complete solution unless A, = A2. 


In a linear algebra course we expect matrices and eigenvalues. Therefore we turn the 
scalar equation (with y”) into a vector equation for y and y': first derivative only. 
Suppose the mass is m = 1. Two equations for u = (y, y’) give du/dt = Au: 


dy /dt = y’ 
y/ á converts to ie | = | a J H = Au. (9) 
dy'/dt = —ky — by’ dt |Y =k = 6) {Y 


The first equation dy/dt = y’ is trivial (but true). The second is equation (8) connecting 
y” toy’ and y. Together they connect u’ to u. So we solve u’ = Au by eigenvalues of A: 
—k 


A-AI= | À 2 has determinant \7+b\+k=0. 


The equation for the )’s is the same as (8)! It is still A? + bA + k = 0, since m = 1. 
The roots A; and Az are now eigenvalues of A. The eigenvectors and the solution are 


1 i 1 1 
Ti = H L2 = H u(t) = ee" fa pege” F f 


The first component of u(t) has y = cet + cye*24—the same solution as before. 


It can’t be anything else. In the second component of u(t) you see the velocity dy/dt. 
The vector problem is completely consistent with the scalar problem. The 2 by 2 matrix A 
is called a companion matrix—a companion to the second order equation with y”. 
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Example 3 Motion around a circle with y” + y = 0 and y = cost 


This is our master equation with mass m = 1 and stiffness k = 1 and d = 0: no damping. 
Substitute y = e% into y” + y = 0 to reach A? + 1 = 0. The roots are A = i and 


A = —i. Then half of e + e~* gives the solution y = cost. 
As a first-order system, the initial values y(0) = 1, y’(0) = 0 go into u(0) = (1,0): 
du diy O 1 y 
i a Se < = 
Use y” = —y TE H pa | | Au. (10) 
The eigenvalues of A are again the same À = ¿i and A = —i (no surprise). A is anti- 


symmetric with eigenvectors x; = (1, i) and 22 = (1, —i). The combination that matches 


u(0) = (1,0) is ¿(xı + £2). Step 2 multiplies the x’s by e and e~™. Step 3 combines 


the pure oscillations into w(t) to find y = cost as expected: 


wi =e] [+ 5e* | i] =[_ See]. thisie [119 . 


All good. The vector u = (cost, — sin t) goes around a circle (Figure 6.3). The radius is 1 
because cos? t + sin? t = 1. 
Difference Equations (optional) 


To display a circle on a screen, replace y” = —y by a difference equation. Here are three 
choices using Y (t + At) — 2Y (t) + Y(t — At). Divide by (At)? to approximate y”. 


F Forward from n — 1 E toys —Yn—ı 
C Centered at time n aaa 7 a —Yn 
B Backward from n + 1 (At) —Yn+1 


Figure 6.3 shows the exact y(t) = cost completing a circle at t = 27. The three dif- 
ference methods don’t complete a perfect circle in 32 time steps of length At = 27/32. 
Those pictures will be explained by eigenvalues: 


Forward |A| >1(spiral out) Centered |A| = 1 (best) Backward |A|<1 (spiral in) 


The 2-step equations (11) reduce to 1-step systems Un+1 = AU n. Instead of u = (y, y’) 
the discrete unknown is Un = (Yn, Zn). We take n time steps At starting from Uo: 


Forward Y,11=Y,+AtZ, , | 2 Able] - 
SA ë Ue E 1 | Ea =AUn 
Those are like Y’ = Z and Z’ = —Y. They are first order equations involving times n and 


n + 1. Eliminating Z would bring back the “forward” second order equation (11 F). 
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My question is simple. Do the points (Yn, Zn) stay on the circle Y? + Z? = 1? 


No, they are growing to infinity in Figure 6.3. We are taking powers A” and not e^t, so 
we test the magnitude |X| and not the real parts of the eigenvalues. 


Eigenvaluesof A A=—1+iAt Then IA| > 1 and (Yn, Zn) spirals out 


cos t 
—sint 


Figure 6.3: Exact u = (cost, — sin t) on a circle. Forward Euler spirals out (32 steps). 


The backward choice in (11 B) will do the opposite in Figure 6.4. Notice the new A: 


Bakad Yetta Yat Atlas F a [een] = [2] <0, bas 


Zai = Bg = At Vtg At 1 || Zpat A 


That matrix has eigenvalues 1 + ¿^At. But we invert it to reach Un+ı from Un. 
Then |A| < 1 explains why the solution spirals in to (0,0) for backward differences. 


On the right side of Figure 6.4 you see 32 steps with the centered choice. The solution 
stays close to the circle (Problem 28) if At < 2. This is the leapfrog method, constantly 
used. The second difference Yn+1 — 2Yn + Yn—1ı “leaps over” the center value Y,, in (11). 


This is the way a chemist follows the motion of molecules (molecular dynamics leads 
to giant computations). Computational science is lively because one differential equation 
can be replaced by many difference equations—some unstable, some stable, some neutral. 
Problem 30 has a fourth (very good) method that stays right on the circle. 


Real engineering and real physics deal with systems (not just a single mass at 
one point). The unknown y is a vector. The coefficient of y” is a mass matrix M, 
with n masses. The coefficient of y is a stifness matrix K, not a number k. The coef- 
ficient of y’ is a damping matrix which might be zero. 


The vector equation My” + Ky = f is a major part of computational mechanics. 
It is controlled by the eigenvalues of M~1K in Kæ = AM. 
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Figure 6.4: Backward differences spiral in. Leapfrog stays near the correct circle. 


Stability of 2 by 2 Matrices 


For the solution of du/dt = Au, there is a fundamental question. Does the solution 
approach u = 0 as t — œ? Is the problem stable, by dissipating energy? A solution that 
includes e* is unstable. Stability depends on the eigenvalues of A. 

The complete solution u(t) is built from pure solutions eta. If the eigenvalue A is real, 
we know exactly when et will approach zero: The number X must be negative. 
If the eigenvalue is a complex number A = r + is, the real part r must be negative. 
When eò splits into e"tetst the factor e’5* has absolute value fixed at 1: 


ist 


e" —cosst+isinst has le”)? 


= cos? st + sin? st = 1. 
The real part of controls the growth (r > 0) or the decay (r < 0). 

The question is: Which matrices have negative eigenvalues? More accurately, when 
are the real parts of the ’s all negative? 2 by 2 matrices allow a clear answer. 


Stability A is stable and u(t) — O when all eigenvalues \ have negative real parts. 
The 2 by 2 matrix A = ia A must pass two tests: 


Ar +A2 <0 The trace T =a+d must be negative. 
NA 0 The determinant D = ad— bc must be positive. 


Reason If the )’s are real and negative, their sum is negative. This is the trace T. Their 
product is positive. This is the determinant D. The argument also goes in the reverse 
direction. If D = 1.2 is positive, then A; and Az have the same sign. If T = A; + Ag is 
negative, that sign will be negative. We can test T' and D. 

If the »’s are complex numbers, they must have the form r + is and r — is. 
Otherwise T and D will not be real. The determinant D is automatically positive, since 
(r + is)(r —is) = r? + s?. The trace T is r+ is +r —is = 2r. Soa negative trace T 
means that the real part r is negative and the matrix is stable. Q.E.D. 
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Figure 6.5 shows the parabola T? = 4D separating real \’s from complex ’s. 
Solving A? — TA + D = 0 involves the square root VT? — 4D. This is real below the 
parabola and imaginary above it. The stable region is the upper left quarter of the figure— 
where the trace T is negative and the determinant D is positive. 


determinant D 


“both Re A < 0] both Re À > 0 Pig p a] stable 
: 23 
unstable / 
£ 0O 4 
l unstable 
7 both A > 0 E aa 


4 T? = A ae unstable E 


—7 
= trace T i neutral 


D< 0 means A; < 0 and A> > 0: unstable 


Figure 6.5: A 2 by 2 matrix is stable (u(t) + 0) when trace < 0 and det > 0. 


The Exponential of a Matrix 


We want to write the solution u(t) in a new form e“*u(0). First we have to say what e^t 


means, with a matrix in the exponent. To define e^t for matrices, we copy e” for numbers. 
The direct definition of e” is by the infinite series 1 + £ + $x" + ir’ +- --. When you 
change z to a square matrix At, this series defines the matrix exponential eft. 


Matrix exponential e^t e^t = I + At +4(At)? + HAt)? +- (14) 


Its ¢ derivative is e^t Prp TEP pm Ae“! 


Its eigenvalues are eò (I+ At + $(At)? +---)a = (1 +At+ (At)? +---)a 


The number that divides (At)” is “n factorial”. This is n! = (1)(2)--- (n — 1)(n). 
The factorials after 1,2,6 are 4! = 24 and 5! = 120. They grow quickly. The series 
always converges and its derivative is always Ae^t. Therefore e^tu(0) solves the 
differential equation with one quick formula—even if there is a shortage of eigenvectors. 

I will use this series in Example 4, to see it work with a missing eigenvector. 
It will produce teò. First let me reach Xe^t X-T! in the good (diagonalizable) case. 
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This chapter emphasizes how to find u(t) = e“tu(0) by diagonalization. Assume A 
does have n independent eigenvectors, so it is diagonalizable. Substitute A = XAX7! 
into the series for e^t. Whenever XAX~!XAX7! appears, cancel X~1X in the middle: 


Use the series ef EX AX AICA TAAA 4) 4 


Factor out X and X~! =X [I+ At+ 4(At)? +---] X77} AS) 


e^t is diagonalized! eñt — Xex. 
At 


e^t has the same eigenvector matrix X as A. Then A is a diagonal matrix and so is e’*’. 
The numbers et are on the diagonal. Multiply Xe^t X-!1u(0) to recognize u(t): 


eàt 


e^tu(0) = Xe X tu(0) = | a1 > Ln 5 |. (16) 


Ci 


This solution e^tu(0) is the same answer that came in equation (6) from three steps: 
1. u(0) = cya, +--+ + Cyn = Xc. Here we need n independent eigenvectors. 


2. Multiply each x, by its growth factor e+ to follow it forward in time. 


3. The best form of e^tu(0) is u(t) = cerita, + -ee + cnertan. (17) 


Example 4 When you substitute y = e% into y” — 2y’ + y = 0, you get an equation with 

repeated roots: à? — 2\+ 1 = 0 is (A—1)? = 0 with A = 1, 1. A differential equations 

course would propose e*t and te’ as two independent solutions. Here we discover why. 
Linear algebra reduces y” — 2y’ + y = 0 toa vector equation for u = (y, y’): 


djl y| y' . du B 0 1 
-oE | 1S gn 4u=| 3 ake (18) 


A has a repeated eigenvalue A = 1, 1 (with trace = 2 and det A = 1). The only eigen- 
vectors are multiples of = (1,1). Diagonalization is not possible, A has only one line of 
eigenvectors. So we compute e“* from its definition as a series: 


Short series est =g ep Sere (d=), (19) 


That “infinite” series for e(4-)* ended quickly because (A — I)? is the zero matrix! 
You can see tet in equation (19). The first component of e^t u(0) is our answer y(t): 


| 7 | = r+ i | t | Ta | y(t) = e*y(0) — tety(0) + tety’ (0). 
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Example 5 Use the infinite series to find e4* for A = [ tole Notice that A* = J: 


a-pa J eE al e-h] ea) 


A5, Aĉ, A’, A8 will be a repeat of A, A?, A3, At. The top right corner has 1,0, —1,0 
repeating over and over in powers of A. Then t — at? starts the infinite series for e“* in 


that top right corner, and 1 — at? starts the top left corner: 


lat? es PSatp te 
a Ape A Tae a i A 


-t+ 4t? 1—4? +- 


The top row of that matrix e^t shows the infinite series for the cosine and sine! 


A=| 0 | eft | cost pone (20) 


—1 0 — sint cost 
A is an antisymmetric matrix (AT = —A). Its exponential oe is an orthogonal matrix. 
The eigenvalues of A are i and —i. The eigenvalues of e^t are ett and e~**. Three rules: 


1 e^t always has the inverse e~“*. 


2 The eigenvalues of e^t are always ene 


At —At 


3 When A is antisymmetric, e^" is orthogonal. Inverse = transpose = e 


Antisymmetric is the same as “skew-symmetric”. Those matrices have pure imaginary 
eigenvalues like i and —i. Then e^t has eigenvalues like e* and e~. Their absolute value 
is 1: neutral stability, pure oscillation, energy is conserved. So ||u(t)|| = ||w(0)]]. 


Our final example has a triangular matrix A. Then the eigenvector matrix X is trian- 
gular. So are X~! and e^t. You will see the two forms of the solution: a combination of 
eigenvectors and the short form e4°u(0). 


d 
Example 6 Solve Ay = A: u starting from u(0) = = att = 0. 

dt 0 2 1 
Solution The eigenvalues 1 and 2 are on the diagonal of A (since A is triangular). The 
eigenvectors are (1,0) and (1,1). The starting u(0) is zı + £2 soc, = c = 1. 


Then u(t) is the same combination of pure exponentials (no teò% when À = 1 and 2): 


Solution to u’ = Au u(t) = et H +e H 


That is the clearest form. But the matrix form with et produces u(t) for every u(0): 
eee cI che fe? Lal ale e+e 
u(t) = Xe X~ -u(0) is F d | | F i u(0) = 0 ot u(0). 


That last matrix is e^t. It is nice because A is triangular. The situation is the same as 
for Ax = b and inverses. We don’t need AT! to find x, and we don’t need e^t to solve 
du/dt = Au. But as quick formulas for the answers, A~!b and e4*u(0) are unbeatable. 
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= REVIEW OF THE KEY IDEAS =" 


1. The equation u’ = Au is linear with constant coefficients in A. Start from u(0). 


2. Its solution is usually a combination of exponentials, involving every \ and z: 


Independent eigenvectors u(t) = cy erite, cere Cne En. 


. The constants c1,. . ., Cn are determined by u (0) = c1£1 +--+ + Cn£n = Xe. 


u(t) approaches zero (stability) if every À has negative real part: All e** > 0. 


At 


. Solutions have the short form u(t) = e“*u(0), with the matrix exponential e^t. 


An BR WwW 


. Equations with y” reduce to u’ = Au by combining y and y’ into the vector u. 


™ WORKED EXAMPLES = 


6.3A Solve y” +4y’ + 3y = 0 by substituting e** and also by linear algebra. 


Solution Substituting y = e% yields (A? + 4A 4+ Jer = 0. That quadratic factors into 
A? +4A +3 = (A4+1)(A+3) = 0. Therefore Ay = —1 and A2 = —3. The pure solutions 
are yı = e™* and y2 = e™®t. The complete solution y = c1y1 + c2Yy2 approaches zero. 


To use linear algebra we set u = (y, y’). Then the vector equation is u’ = Au: 


dy/dt = y' 


converts to eae u 
dy’ /dt = —3y — 4y’ dt |- 


3 —4 
This A is a “companion matrix” and its eigenvalues are again —1 and —3: 


—À 1 


Same quadratic det(A — AI) = | -3 å) 


|=x2+a+3=0 


The eigenvectors of A are (1, ,) and (1, A2). Either way, the decay in y(t) comes from 
e™* and e~**. With constant coefficients, calculus leads to linear algebra Ax = Az. 


Note In linear algebra the serious danger is a shortage of eigenvectors. Our eigenvectors 
(1, Ai) and (1, Az) are the same if Ay = A. Then we can’t diagonalize A. In this case we 
don’t yet have two independent solutions to du/dt = Au. 


In differential equations the danger is also a repeated À. After y = eùt, a second 


solution has to be found. It turns out to be y = tet. This “impure” solution (with an 
extra t) appears in the matrix exponential eft, Example 4 showed how. 


330 Chapter 6. Eigenvalues and Eigenvectors 


6.3B Find the eigenvalues and eigenvectors of A. Then write u(0) = (0,2/2,0) as 
a combination of the eigenvectors. Solve both equations u’ = Au and u” = Au: 


ote ®) 2 a a on) 
d 
* = 1-2 ilu ad i= i> L| u with (0) =0. 
i i = i 0 1-2 : 


u’ = Au is like the heat equation ðu /ðt = 0?u/Ox?. 

Its solution u(t) will decay (A has negative eigenvalues). 

u” = Au is like the wave equation 8?u/ Ot? = 0?u/dx?. 
Its solution will oscillate (the square roots of are imaginary). 


Solution The eigenvalues and eigenvectors come from det(A — AJ) = 0: 


—2— À 1 0 
dett(A-AF=]| 1 -2-A 1 |=(-2-)[(-2-A° -2]=0. 
0 1 —2— 2 
One eigenvalue is A = —2, when —2 — A is zero. The other factor is AX AXT 2, so the 
other eigenvalues (also real and negative) are \ = —2 + v2. Find the eigenvectors: 
~ 1 0| he 0 1 
A= —2 (A+ 2)e=|1 0 1) jy) |0| fogzı=]| 0 
Cie Nee 8 Z 0 —1 
v2 1 0]ļfz 0 1 
a) on) (A= Ags | 1 72 1 y |= |0 for xo = |—V/2 
ee! Mee 8 le 0 1 
-y2 1 0 x 0 1 
A= 24472 (A-—Al)ax = OE O E y | #0 for 23 = | v2 
0 4/2) 12 0 1 


The eigenvectors are orthogonal (proved in Section 6.4 for all symmetric matrices). 
All three à; are negative. This A is negative definite and e^* decays to zero (stability). 
The starting u(0) = (0, 2v2, 0) is £3 — x2. The solution is u(t) = e*8*a3 — e®™tg3. 


Heat equation In Figure 6.6a, the temperature at the center starts at Daf. Heat diffuses 
into the neighboring boxes and then to the outside boxes (frozen at 0°). The rate of heat 
flow between boxes is the temperature difference. From box 2, heat flows left and right at 
the rate uw; — uz and u3 — us. So the flow out is uw; — 2ug + ug in the second row of Au. 


Wave equation d?u/dt? = Au has the same eigenvectors x. But now the eigenvalues À 


lead to oscillations e““* x and e~*“*x. The frequencies come from w? = —: 
d2 
aa (ea) = A(e™ x) becomes (iw) etr = Ae™*a and w? = —X. 


There are two square roots of — À, so we have ete and e~™*ta. With three eigenvectors 
this makes six solutions to u” = Aw. A combination will match the six components of 
u(0) and u’(0). Since u’ = 0 in this problem, ea and e~*“'a produce 2 cos wt x. 
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f=0 gO (=Q E 
t>0 

a 

0 1 2 3 4 


Figure 6.6: Heat diffuses away from box 2 (left). Wave travels from box 2 (right). 


6.3 C Solve the four equations da/dt = 0,db/dt = a,dc/dt = 2b,dz/dt = 3c 
in that order starting from u(0) = (a(0), b(0 j: c(0),z(0)). Solve the same equations 
by the matrix exponential in u(t) = e4*u(0). 


Four equations a 0 0 0 O| Ja 

A = 0,0,0,0 d |b} |1 0 O Of {6b du A 
Eigenvalueson dt|c| |0 2 0 Olc re i 
the diagonal Zz 0 0 3 Of {z 


First find A”, A3, A* and e^t = I + At + 4(At)? + 4(At)?. Why does the series stop? 
Why is it true that (e4)(e4) = (e24)? Always e^5 times e^t is Als +t), 


Solution 1 Integrate da/dt = 0, then db/dt = a, then dc/dt = 2b and dz /dt = 3c: 


a(t) = 0 The 4 by 4 matrix which is 
b(t) = ta(0)+  b(0) multiplying a(0), b(0), c(0), d(0) 
c(t) = t?a(0) + 2tb(0)+ (0) to produce a(t), b(t), c(t), d(t) 
z(t) = t3a(0) + 3t°b(0) + 3tc(0) + z(0) must be the same e^t as below 
Solution 2 The powers of A (strictly triangular) are all zero after A’. 
0o 000 0 0 0 0 0000 
ey |e ee ee Oa 5 00} 00) E me 4 
sia O22. 0 0 ae 2 0 0 0 a 0000 4 =0 
0 0 3 0 0600 6 0 0 0 


The diagonals move down at each step. So the series for eft stops after four terms: 


1 
Same e^t as (At)? (Az)? t 4 
. C = ft - 
in Solution 1 = I+ Att 9 6 o oF 1 
© 32 Jt 1 


The square of e^ is e?^, But eâ and P e^ and e4 +B can be all different. 
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Problem Set 6.3 


1 Find two .’s and x’s so that u = ex solves 
du _|4 3 
Arlee | eee lias 


What combination u = c,e*!*a, + cye*2!ar starts from u(0) = (5, —2)? 


2 Solve Problem 1 for u = (y, z) by back substitution, z before y: 


d d 
Solve = = z from z(0) = —2. Then solve = = 4y + 3z from y(0) = 5. 


The solution for y will be a combination of e# and et. The A’s are 4 and 1. 


3 (a) If every column of A adds to zero, why is A = 0 an eigenvalue? 


(b) With negative diagonal and positive off-diagonal adding to zero, u’ = Au 
will be a “continuous” Markov equation. Find the eigenvalues and eigenvec- 
tors, and the steady state as t + co 


4 


d z 
Solve es | 2 | u with u(0) = f 


7 2 3 K What is u(co)? 


4 A door is opened between rooms that hold v(0) = 30 people and w(0) = 10 people. 
The movement between rooms is proportional to the difference v — w: 


dv A dw 

~ Un — =v- wW. 

dt dt 
Show that the total v + w is constant (40 people). Find the matrix in du/dt = Au 
and its eigenvalues and eigenvectors. What are v and w att = 1 and t = œ? 


5 Reverse the diffusion of people in Problem 4 to du/dt = —Au: 


dv d dw 

— =) = w — =w. 

a an Ta 
The total v+w still remains constant. How are the \’s changed now that A is changed 
to —A? But show that v(t) grows to infinity from v(0) = 30. 


6 A has real eigenvalues but B has complex eigenvalues: 


a 1l boat 
A 1 a | (a and b are real) 


Find the conditions on a and b so that all solutions of du/dt = Aw and 
dv/dt = Bv approach zero as t > 00: Re A < 0 for all eigenvalues. 
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7 Suppose P is the projection matrix onto the 45° line y = x in R?. What are its 
eigenvalues? If dw/dt = —Pu (notice minus sign) can you find the limit of w(t) at 
t = œ starting from u(0) = (3, 1)? 


8 The rabbit population shows fast growth (from 6r) but loss to wolves (from —2w). 
The wolf population always grows in this model (—w? would control wolves): 


d d 
— = 6r — 2w and = = ort. 


Find the eigenvalues and eigenvectors. If r(0) = w(0) = 30 what are the populations 
at time t? After a long time, what is the ratio of rabbits to wolves? 


9 (a) Write (4,0) as a combination c1£1 + c2%2 of these two eigenvectors of A: 


COL alt E 
ee ai E S aa e 

(b) The solution to du/dt = Au starting from (4,0) is cjeta, + coe "ay. Sub- 
stitute et = cost + i sin t and e7” = cost — isin t to find u(t). 


Questions 10-13 reduce second-order equations to first-order systems for (y, y’). 


10 Find A to change the scalar equation y” = 5y’ + 4y into a vector equation for 


u = (y,y’): F 
ao Ul=[ [y= 


What are the eigenvalues of A? Find them also by substituting y = eò into y” = 
by’ + 4y. 


11 The solution to y” = 0 is a straight line y = C + Dt. Convert to a matrix equation: 


d |y|_ [0 lily n |y | _ At | y(0) 
ar H = F H has the solution a =e f 


y'(0) 
This matrix A has \ = 0,0 and it cannot be diagonalized. Find A? and compute 
eft = T+ Att 5A°t? +--+. Multiply your e^t times (y(0), y’(0)) to check the 
straight line y(t) = y(0) + y’(O)t. 


12 Substitute y = e% into y” = 6y’ — 9y to show that A = 3 is a repeated root. This is 
trouble; we need a second solution after e%t. The matrix equation is 


ale ealo 


Show that this matrix has \ = 3,3 and only one line of eigenvectors. Trouble here 
too. Show that the second solution to y” = 6y’ — 9y is y = te**. 
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13 (a) Write down two familiar functions that solve the equation d?y/dt? = —9y. 
Which one starts with y(0) = 3 and y’(0) = 0? 
(b) This second-order equation y” = —9y produces a vector equation u’ = Aw: 


ei] gsldi ol fe] 


Find w(t) by using the eigenvalues and eigenvectors of A: u(0) = (3,0). 


14 The matrix in this question is skew-symmetric (AT = — A): 
— / =m i 
du 0 C b My Cu2 bus 
— = |—-—c 0 alu or Uy = AU3 — cu 
dt e 
b —a 0 Uz = bu; — auz. 


(a) The derivative of ||u(t)|/? = u?+uz+u2 is 2u1u! +2uzuh+2u3u4. Substitute 
pug rug 1 2 3 
ul , US, US to get zero. Then ||u/(t)||? stays equal to ||u(0)||2. 


(b) When A is skew-symmetric, Q = e^ is orthogonal. Prove QT = e~4* from 
the series for Q = e^t. Then QTQ =I. 


15 A particular solution to du/dt = Au — bis up = A~'b, if A is invertible. The usual 
solutions to du/dt = Au give un. Find the complete solution u = up + Un: 


du du 1 0 4 
(a) —=u-4 (b) zf ile- [6]: 


16 Ifcis not an eigenvalue of A, substitute u = ev and find a particular solution to 
du/dt = Au — e“b. How does it break down when c is an eigenvalue of A? The 
“nullspace” of du/dt = Au contains the usual solutions ea. 


17 Finda matrix A to illustrate each of the unstable regions in Figure 6.5: 
(a) Ay <OandAg >0 (b) A, >O0andr»A2>0 (c)A=atibwitha > 0. 
Questions 18-27 are about the matrix exponential e“*. 


At Take the t derivative of each term. 


Atuo solves u’ = Au. 


18 Write five terms of the infinite series for e 
Show that you have four terms of Ae“*, Conclusion: e 


19 The matrix B = a | has B? = 0. Find e?* from a (short) infinite series. 


Check that the derivative of e?*? is Be?*. 


20 Starting from u(0) the solution at time T is e47u(0). Go an additional time t to 
reach e^t e^Tu(0). This solution at time t + T can also be written as 
Conclusion: e^t times e47 equals 


21 Write A = ia | in the form X AX ~?. Find e from Xe% Xt. 


6.3. Systems of Differential Equations 335 


22 


23 


24 


25 


26 


27 


28 


If A? = A show that the infinite series produces e^t = I+ (et —1)A. For A = [4 4] 


in Problem 21 this gives e4* = 


Generally efe? is different from eP e4. They are both different from a as 
Check this using Problems 21-22 and 19. (If AB = BA, all three are the same.) 


Rg p 


Write A = [33] as XAX—*. Multiply X e^t X! to find the matrix exponential 
e^t, Check e“ and the derivative of e^t when t = 0. 


Put A = E A into the infinite series to find e^t. First compute A? and A”: 
1-0 od ee _ fe 
vito ttl Jen=[o 


(Recommended) Give two reasons why the matrix exponential e^t is never singular: 


e^t = 


(a) Write down its inverse. 


(b) Why are these eigenvalues nonzero? If Ax = Ax then efty = r. 


Find a solution x(t), y(t) that gets large as t —> oo. To avoid this instability a scientist 
exchanged the two equations: 


dx/dt = Ox —4y dy/dt = —2x + 2y 
dy/dt = —2x+2y becomes dz/dt= Ox -— 4y. 


Now the matrix [ E E is stable. It has negative eigenvalues. How can this be? 
Challenge Problems 


Centering y” = —y in Example 3 will produce Yp+1 — 2Yn + Yn-1 = — (At)? Yn. 
This can be written as a one-step difference equation for U = (Y, Z): 


Ta = Yn + At Zn 00 fae | eae i 
Znal = Ln = At Yaaa Atal Za | OT Zn 


Invert the matrix on the left side to write this as Un+1 = AU n. Show that det Av 
Choose the large time step At = 1 and find the eigenvalues A; and Aj = A, of A: 


1 l 


= E 0 


| has |A;| = |A2| = 1. Show that A® is exactly I. 


After 6 steps to t = 6, Ug equals Uo. The exact y = cost returns to 1 at t = 27. 
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29 That centered choice (leapfrog method) in Problem 28 is very successful for small 
time steps At. But find the eigenvalues of A for At = v2 and 2: 


asf E] wt aji 2), 


Both matrices have |A| = 1. Compute A?* in both cases and find the eigenvectors 
of A. That second value At = 2 is at the border of instability. Any time step At > 2 
will lead to |A| > 1, and the powers in U,, = A"U will explode. 


Note You might say that nobody would compute with At > 2. But if an atom 
vibrates with y” = —1000000y, then At > .0002 will give instability. Leapfrog has 
a very strict stability limit. Y,41 = Yn + 3Zn and Zn+1 = Zn — 3Yn+1 will explode 
because At = 3 is too large. The matrix has |A| > 1. 


30 Another good idea for y” = —y is the trapezoidal method (half forward/half back). 
This may be the best way to keep (Yn, Zn) exactly on a circle. 


. 1 =At/2] Ya] | 1 a2 Ya 
SCADEA Ree 1 Wea E 1 Sa |" 


(a) Invert the left matrix to write this equation as U,,4,; = AU n. Show that A is 
an orthogonal matrix: ATA = I. These points U,, never leave the circle. 
A = (I — B)~1(I + B) is always an orthogonal matrix if BT = —B. 

(b) (Optional MATLAB) Take 32 steps from Ug = (1,0) to U32 with At = 27/32. 
Is U> = Uo? I think there is a small error. 


31 The cosine of a matrix is defined like e4, by copying the series for cost: 


u lo l4 = L4 2 i 4 
cost=1- zt ta mere CSÁ = I The + 7A . 


(a) If Ax = Ax, multiply each term times z to find the eigenvalue of cos A. 


(b) Find the eigenvalues of A = fi d with eigenvectors (1,1) and (1, —1). 
From the eigenvalues and eigenvectors of cos A, find that matrix C = cos A. 
(c) The second derivative of cos( At) is — A? cos( At). 


d2 
u(t) = cos(At) u(0) solves = = —A’u starting from u/(0) = 0. 


Construct u(t) = cos( At) u(0) by the usual three steps for that specific A: 


1. Expand u(0) = (4,2) = c, x1 + c2%£2 in the eigenvectors. 


2. Multiply those eigenvectors by and (instead of e**). 


3. Add up the solution u(t) = cy 21+ C2 


T2. 
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32 Explain one of these three proofs that the square of e4 is e?4. 


1. Solving with e^ fromt = 0 to 1 and then 1 to 2 agrees with e24 from 0 to 2. 
2. The squared series (J + A+ An +--+)? matches J +2A+ BAY eee, 
3. If A can be diagonalized then (Xe4X—!)(Xe4X—!) = Xe?Ax-1, 


Notes on a Differential Equations Course 


Certainly constant-coefficient linear equations are the simplest to solve. This section 6.3 of 
the book shows you part of a differential equations course, but there is more: 


1. The second order equation mu” + bu’ + ku = 0 has major importance in appli- 
cations. The exponents À in the solutions u = eò solve mA? + bà +k = 0. 
The damping coefficient b is crucial: 


Underdamping b? <4mk Critical damping b? =4mk Overdamping b? > 4mk 


This decides whether ; and àz are real roots or repeated roots or complex roots. 
With complex A’s the solution u(t) oscillates as it decays. 


2. Our equations had no forcing term f(t). We were finding the “nullspace solution”. 
To u,(t) we need to add a particular solution up(t) that balances the force f(t): 


Input f(s) at time s t 
Growth factor e4(t—») Uparticular = f ny (s) ds. 
Add up outputs at time t g 


This solution can also be discovered and studied by Laplace transform—that is 
the established way to convert linear differential equations to linear algebra. 


In real applications, nonlinear differential equations are solved numerically. A standard 
method with good accuracy is “Runge-Kutta”—named after its discoverers. Analysis can 
find the constant solutions to du/dt = f (u). Those are solutions u(t) = Y with f(Y) = 0 
and du/dt = 0: no movement. We can also understand stability or instability near u = Y. 
Far from Y, the computer takes over. 


This basic course is the subject of my textbook (companion to this one) on 
Differential Equations and Linear Algebra: math.mit.edu/dela. 
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6.4 Symmetric Matrices 


1 A symmetric matrix S has n real eigenvalues à; and n orthonormal eigenvectors q,,...,q 


2 Every real symmetric S can be diagonalized:| S = QAQ-! = QAQT™ 


3 The number of positive eigenvalues of S equals the number of positive pivots. 


n° 


4 Antisymmetric matrices A = — AT have imaginary »’s and orthonormal (complex) q’s. 


5 Section 9.2 explains why the test S = ST becomes S = S” for complex matrices. 


C= | k j | = 37 has weal A= 1, -1 A= | i j | = - AT masasi- 


It is no exaggeration to say that symmetric matrices S are the most important matrices 
the world will ever see—in the theory of linear algebra and also in the applications. We 
come immediately to the key question about symmetry. Not only the question, but also the 
two-part answer. 


What is special about Sx = Ax when S is symmetric? 


We look for special properties of the eigenvalues À and eigenvectors x when S = ST. 


The diagonalization S = XAX~! will reflect the symmetry of S. We get some hint by 
transposing to ST = (X-1)TAXT. Those are the same since S = ST. Possibly X~! 
in the first form equals XT in the second form? Then XTX = I. That makes each 
eigenvector in X orthogonal to the other eigenvectors when S = ST. Here are the key facts: 


1. A symmetric matrix has only real eigenvalues. 


2. The eigenvectors can be chosen orthonormal. 


Those n orthonormal eigenvectors go into the columns of X. Every symmetric matrix can 
be diagonalized. Its eigenvector matrix X becomes an orthogonal matrix Q. Orthogonal 
matrices have Q71 = QT—what we suspected about the eigenvector matrix is true. To 
remember it we write Q instead of X, when we choose orthonormal eigenvectors. 


Why do we use the word “choose”? Because the eigenvectors do not have to be unit 
vectors. Their lengths are at our disposal. We will choose unit vectors—eigenvectors of 
length one, which are orthonormal and not just orthogonal. Then A = XAX~? is in its 
special and particular form S = QAQ™ for symmetric matrices. 
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(Spectral Theorem) Every symmetric matrix has the factorization S = QAQ? with real 
eigenvalues in A and orthonormal eigenvectors in the columns of Q: 


Symmetric diagonalization S=QAQ7!=QAQ'" with Q-'=Q'. D 


It is easy to see that QAQ™ is symmetric. Take its transpose. You get (QT)TATQT, which 
is QAQ? again. The harder part is to prove that every symmetric matrix has real \’s and 
orthonormal z’s. This is the “spectral theorem” in mathematics and the “principal axis 
theorem” in geometry and physics. We have to prove it! No choice. I will approach the 
proof in three steps: 


1. By an example, showing real \’s in A and orthonormal x’s in Q. 
2. By a proof of those facts when no eigenvalues are repeated. 


3. By a proof that allows repeated eigenvalues (at the end of this section). 


2 4 2 4—> 
Solution The determinant of S — AI is å? — 5A. The eigenvalues are 0 and 5 (both real). 
We can see them directly: A = 0 is an eigenvalue because S is singular, and A = 5 matches 
the trace down the diagonal of S: 0 + 5 agrees with 1 + 4. 

Two eigenvectors are (2, —1) and (1, 2)—orthogonal but not yet orthonormal. The 
eigenvector for À = 0 is in the nullspace of A. The eigenvector for A = 5 is in the column 
space. We ask ourselves, why are the nullspace and column space perpendicular? The 
Fundamental Theorem says that the nullspace is perpendicular to the row space—not the 
column space. But our matrix is symmetric! Its row and column spaces are the same. Its 
eigenvectors (2, —1) and (1, 2) must be (and are) perpendicular. 

These eigenvectors have length v5. Divide them by v5 to get unit vectors. Put those 
unit eigenvectors into the columns of Q. Then Q~!SQ is A and Qt = QT: 


ese Se YE JAE -R es 


Now comes the n by n case. The \’s are real when S = ST and Sx = Az. 


Example 1 Find the ’s and x’s when S = E l and S — AI = i A | 


Real Eigenvalues All the eigenvalues of a real symmetric matrix are real. 


Proof Suppose that Sx = Ax. Until we know otherwise, À might be a complex number 
a + ib (a and b real). Its complex conjugate is A = a — ib. Similarly the components 
of x may be complex numbers, and switching the signs of their imaginary parts gives ®©. 
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The good thing is that À times © is always the conjugate of À times x. So we can take 
conjugates of Sa = Ax, remembering that S is real: 


Sx=« leadsto ST= AT. Transpose to ZTS =F" A. 
Now take the dot product of the first equation with & and the last equation with æ: 
Ee Se@=8 \ox and also TTS =T \xz. (2) 


The left sides are the same so the right sides are equal. One equation has 4, the other 
has A. They multiply rae = |x|? + |z2|? +--- = length squared which is not zero. 
Therefore A must equal A, and a + ib equals a — ib. So b = 0 and à = a = real. Q.E.D. 


The eigenvectors come from solving the real equation (S — AT )æ = 0. So the æx’s are 
also real. The important fact is that they are perpendicular. 


Orthogonal Eigenvectors Eigenvectors of a real symmetric matrix (when they corre- 
spond to different A’s) are always perpendicular. 


Proof Suppose Sx = àx and Sy = àz2y. We are assuming here that A; Æ àz. Take dot 
products of the first equation with y and the second with æ: 


Use ST = S (Arx)Ty = (Sæ)"y = z" Sty = zT Sy = a7 roy. (3) 


The left side is LTY, the right side is x? oy. Since A; Æ A2, this proves that aly = 0. 
The eigenvector x (for \;) is perpendicular to the eigenvector y (for A2). 


Example 2 The eigenvectors of a 2 by 2 symmetric matrix have a special form: 


; ja b _ b _ wee 
Not widely known s- |} i| has ee and z= | b | (4) 


This is in the Problem Set. The point here is that x; is perpendicular to x2: 


giao = b(A2 -—c) + (Ai —a)b=b(\, + Ax —a—c) = (). 
This is zero because A; + Ag equals the trace a + c. Thus et xo = 0. Eagle eyes might 
notice the special case S = J, when b and A; — a and A2 — cand xı and 2 are all zero. 


Then A; = A2 = 1 is repeated. But of course S = J has perpendicular eigenvectors. 


Symmetric matrices S have orthogonal eigenvector matrices Q. Look at this again: 
Symmetry S=XAX~ becomes S=—QAQ’ with QTQ=ITI. 


This says that every 2 by 2 symmetric matrix is (rotation) (stretch) (rotate back) 


q7 


szafa Pa © 


Columns q, and q, multiply rows \,q} and d2qi to produce S = iqq] +A2q2q7. 
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Every symmetric matrix S=QAQ™ -qq + + And,9.- ©) 


Remember the steps to this great result (the spectral theorem). 
Section 6.2 Write Ax; = A;a; in matrix form AX =XA or A= XAX7—! 
Section 6.4 Orthonormal z; = q; gives X =Q S=QAQ0 = QAQ" 


QAQ? in equation (6) has columns of QA times rows of QT. Here is a direct proof. 


S has correct eigenvectors 


— T tss T er “Gd. 
Those q’s are orthonormal Sq; = (ididi +++ + AnQn dn) qi = Aigi - (7) 


Complex Eigenvalues of Real Matrices 


For any real matrix, Sz = Az gives S T = AZ. Fora symmetric matrix, À and æ turn out 
to be real. Those two equations become the same. But a nonsymmetric matrix can easily 
produce À and x that are complex. Then A ©% = 2 is true but different from Ax = Aa. 
We get another complex eigenvalue (which is À) and a new eigenvector (which is 2): 


For real matrices, complex A’s and x’s come in “conjugate pairs.” 


A=a+ib 


| ‘ If Ax=Aàx then At =NZ. (8) 
A=a-—ib 


cos — sin@ 
sinô cos@ 


Example3 A= l | has A; = cos + i sin 0 and ào = cos 0 — i sin 0. 
Those eigenvalues are conjugate to each other. They are \ and À. The eigenvectors 
must be x and 7, because A is real: 


Tieirs Apo? A eae n |, 
sin 0 cos | |—i —4 


(9) 


This is A% AT = le E | l = (cos 0 — i sin 0) | 
sin cos i i 
Those eigenvectors (1, —7) and (1,7) are complex conjugates because A is real. 

For this rotation matrix the absolute value is |A| = 1, because cos? 0 + sin?@ = 1. 
This fact |A| = 1 holds for the eigenvalues of every orthogonal matrix Q. 

We apologize that a touch of complex numbers slipped in. They are unavoidable even 
when the matrix is real. Chapter 9 goes beyond complex numbers and complex eigen- 
vectors x to complex matrices A. Then you have the whole picture. 

We end with two optional discussions. 
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Eigenvalues versus Pivots 


The eigenvalues of A are very different from the pivots. For eigenvalues, we solve 
det(A — AI) = 0. For pivots, we use elimination. The only connection so far is this: 


product of pivots = determinant = product of eigenvalues. 


We are assuming a full set of pivots d,,...,d,. There are n real eigenvalues 1,..., An. 
The d’s and ’s are not the same, but they come from the same symmetric matrix. Those 
d’s and )’s have a hidden relation. For symmetric matrices the pivots and the eigenvalues 
have the same signs: 


The number of positive eigenvalues of S = S* equals the number of positive pivots. 
Special case: S has all A; > 0 if and only if all pivots are positive. 


That special case is an all-important fact for positive definite matrices in Section 6.5. 


Example 4 This symmetric matrix has one positive eigenvalue and one positive pivot: 


a i has pivots 1 and —8 


Matching signs 2 4 eigenvalues 4 and —2. 


The signs of the pivots match the signs of the eigenvalues, one plus and one minus. 
This could be false when the matrix is not symmetric: 


Opposite signs B = E A has pivots 1 and 2 


1 =4 eigenvalues —1 and —2. 
The diagonal entries are a third set of numbers and we say nothing about them. 
Here is a proof that the pivots and eigenvalues have matching signs, when S = ST. 


You see it best when the pivots are divided out of the rows of U. Then S is LDLT. 
The diagonal pivot matrix D goes between triangular matrices L and LT: 


Ps) Oy ia 1 3 SEAE T ees i 
F el ‘| m F i This is S = LDL- . It is symmetric. 


Watch the eigenvalues of LDLT when L moves to I. S changes to D. 


The eigenvalues of LDLT are 4 and —2. The eigenvalues of IDIT are 1 and —8 (the 
pivots!). The eigenvalues are changing, as the “3” in L moves to zero. But to change sign, 
a real eigenvalue would have to cross zero. The matrix would at that moment be singular. 
Our changing matrix always has pivots 1 and —8, so it is never singular. The signs cannot 
change, as the \’s move to the d’s. 

We repeat the proof for any S = LDLT. Move L toward J, by moving the off-diagonal 
entries to zero. The pivots are not changing and not zero. The eigenvalues À of LDLT 
change to the eigenvalues d of IDIT. Since these eigenvalues cannot cross zero as they 
move into the pivots, their signs cannot change. Same signs for the A’s and d’s. 

This connects the two halves of applied linear algebra—pivots and eigenvalues. 
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All Symmetric Matrices are Diagonalizable 


When no eigenvalues of A are repeated, the eigenvectors are sure to be independent. 
Then A can be diagonalized. But a repeated eigenvalue can produce a shortage of 
eigenvectors. This sometimes happens for nonsymmetric matrices. It never happens 
for symmetric matrices. There are always enough eigenvectors to diagonalize S = ST. 


Here is one idea for a proof. Change S slightly by a diagonal matrix diag(c, 2c, ..., nc). 
If c is very small, the new symmetric matrix will have no repeated eigenvalues. Then we 
know it has a full set of orthonormal eigenvectors. As c + 0 we obtain n orthonormal 
eigenvectors of the original S—even if some eigenvalues of that S are repeated. 

Every mathematician knows that this argument is incomplete. How do we guarantee 
that the small diagonal matrix will separate the eigenvalues? (I am sure this is true.) 


A different proof comes from a useful new factorization that applies to all square 
matrices A, symmetric or not. This new factorization quickly produces S = QAQ? witha 
full set of real orthonormal eigenvectors when S is any real symmetric matrix. 


Every square A factors into QTQ~' where T is upper triangular and g ZQ 
If A has real eigenvalues then Q and T can be chosen real: QTQ =I. 


This is Schur’s Theorem. Its proof will go onto the website math.mit.edu/linearalgebra. 
Here I will show how T is diagonal (T = A) when S is symmetric. Then S is QAQT. 
We know that every symmetric S has real eigenvalues, and Schur allows repeated \’s: 
Schur’s S = QTQ7! means that T = QT SQ. The transpose is again QT SQ. 
The triangular T is symmetric when ST = S. Then T must be diagonalandT = A. 
This proves that S = QAQ7!. The symmetric S has n orthonormal eigenvectors in Q. 


Note. I have added another proof in Section 7.2 of this book. That proof shows how the 
eigenvalues À can be described one at a time. The largest À; is the maximum of æT Sx/a' a. 
Then Az (second largest) is again the same maximum, if we only allow vectors æ that 
are perpendicular to the first eigenvector. The third eigenvalue A3 comes by requiring 
nig, =Uande'q, =0... 

This proof is placed in Chapter 7 because the same one-at-a-time idea succeeds for the 
singular values of any matrix A. Singular values come from AT A and A AT. 


= REVIEW OF THE KEY IDEAS = 


. Every symmetric matrix S has real eigenvalues and perpendicular eigenvectors. 
. Diagonalization becomes S = QAQT with an orthogonal eigenvector matrix Q. 
. All symmetric matrices are diagonalizable, even with repeated eigenvalues. 


. The signs of the eigenvalues match the signs of the pivots, when S = ST. 


an A & N = 


. Every square matrix can be "triangularized" by A = QTQ7'. If A= S then T = A. 
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™ WORKED EXAMPLES = 


6.4 A What matrix A has eigenvalues \ = 1, —1 and eigenvectors zı = (cos 0, sin 0) 
and #2 = (— sin 0, cos 0)? Which of these properties can be predicted in advance? 


A=AT A =I det A = —1 pivot are + and — A= A 


Solution All those properties can be predicted! With real eigenvalues 1,—1 and 
orthonormal zı and z2, the matrix A = QAQT must be symmetric. The eigenvalues 
1 and —1 tell us that A? = J (since \? = 1) and A~! = A (same thing) and det A = —1. 
The two pivots must be positive and negative like the eigenvalues, since A is symmetric. 

The matrix will be a reflection. Vectors in the direction of x; are unchanged by A 
(since A = 1). Vectors in the perpendicular direction are reversed (since A = —1). The 
reflection A = QAQ? is across the “O-line”. Write c for cos 8 and s for sin 0: 


Aes I 6 esl  [c?—s* 2cs |_ [|cos20 sin20 
~is cl {0-1} |-sc|~ | 2cs s-e] |sin20 —cos26| ° 
Notice that x = (1,0) goes to Ax = (cos 26, sin 20) on the 26-line. And (cos 26, sin 26) 
goes back across the 6-line to x = (1,0). 


6.4B Find the eigenvalues and eigenvectors (discrete sines and cosines) of A3 and B4. 


5 =f 0 a i Sj 
=|. 2 1 B= Bi ey 
0 -1 2 


=l il 


The —1,2,—1 pattern in both matrices is a “second difference”. This is like a second 
derivative. Then Ax = Ax and Bx = Ag are like d?x/dt? = Ax. This has eigenvectors 
x = sin kt and x = cos kt that are the bases for Fourier series. 

An and Bn lead to “discrete sines” and “discrete cosines” that are the bases for the 
Discrete Fourier Transform. This DFT is absolutely central to all areas of digital signal 
processing. The favorite choice for JPEG in image processing has been Bg of size n = 8. 


Solution The eigenvalues of A3 are \ = 2 — v2 and 2 and 2 + v2 (see 6.3 B). Their 
sum is 6 (the trace of A3) and their product is 4 (the determinant). The eigenvector matrix 
gives the “Discrete Sine Transform” and the eigenvectors fall onto sine curves. 


1 1 if 1 


; 1 v2 1 , i yei i 8 
Sines = | v2 0 EVD Cosines = 
ee ae 1 1-vV2 -1 V2-1 
1 —1 1 —1 
Sine matrix = Eigenvectors of Ag Cosine matrix = Eigenvectors of B4 


The eigenvalues of B4 are ÀA = 2— v2 and 2 and 2+ v2 and 0 (the same as for A3, plus 
the zero eigenvalue). The trace is still 6, but the determinant is now zero. The eigenvector 
matrix C gives the 4-point “Discrete Cosine Transform”. The graph on the Web shows how 
the first two eigenvectors fall onto cosine curves. (So do all the eigenvectors of B.) These 
eigenvectors match cosines at the halfway points 7/8, 37/8, 57/8, 77/8. 
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Problem Set 6.4 


1 


Which of these matrices ASB will be symmetric with eigenvalues 1 and —1? 


aall lloa) fra] aja a] fro) ll- o 


B = AT doesn’t do it. B = A~! doesn’t do it. B = _. = __ will succeed. 
So B must be an matrix. 


Suppose S = ST. When is ASB also symmetric with the same eigenvalues as S? 
(a) Transpose ASB to see that it stays symmetric when B = 


(b) ASB is similar to S (same eigenvalues) when B = 


Put (a) and (b) together. The symmetric matrices similar to S look like (___).S(__ ). 


Write A as S + N, symmetric matrix S plus skew-symmetric matrix N: 
I 2 4 
ASA a OEN (S' =5 md NT=-N). 
8 6 5 
For any square matrix, S = ¿(A + AT) and N= _____ addup to A. 


If C is symmetric prove that ATCA is also symmetric. (Transpose it.) When A is 6 
by 3, what are the shapes of C and ATCA? 


Find the eigenvalues and the unit eigenvectors of 
2 2 2 
S=|2 0 0 
Z 00) 
Find an orthogonal matrix Q that diagonalizes S = [ = zi . What is A? 


Find an orthogonal matrix Q that diagonalizes this symmetric matrix: 


2 
S=|]0 -1 -2 
2 —2 0 


O12 
12 161 


Find all orthogonal matrices that diagonalize S' = | 
(a) Find a symmetric matrix Fe A that has a negative eigenvalue. 
(b) How do you know it must have a negative pivot? 


(c) How do you know it can’t have two negative eigenvalues? 
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10 


11 


12 


13 


14 


15 


16 


17 


18 


Chapter 6. Eigenvalues and Eigenvectors 


If A = 0 then the eigenvalues of A must be . Give an example that has 
A # 0. Butif A is symmetric, diagonalize it to prove that A must be a zero matrix. 


If A = a + ib is an eigenvalue of a real matrix A, then its conjugate =a- ib is 
also an eigenvalue. (If Ax = Ax then also Ax = AT: a conjugate pair and 4.) 
Explain why every real 3 by 3 matrix has at least one real eigenvalue. 


Here is a quick “proof” that the eigenvalues of every real matrix A are real: 


xTAx real 


Fal f Ar = i T Ar = zT A= ae 
alse proo T 4b gives T 3i 4 EA b SO le seal 


Find the flaw in this reasoning—a hidden assumption that is not justified. You could 
test those steps on the 90° rotation matrix [0 —1; 1 0] with A= tand æ = (i, 1). 


Write S and B in the form 12,2} + Aaa of the spectral theorem QAQ?: 


s=[] 5] B=fi2 1| Ceerle: = eal =. 


Every 2 by 2 symmetric matrix is Axa} + Axor? = AP + A2P2. Explain 
P, + Þ = eye + gone = Í from columns times rows of Q. Why is Pi P2 = 0? 


What are the eigenvalues of A = Be a ? Create a 4 by 4 antisymmetric matrix 
(AT = —A) and verify that all its eigenvalues are imaginary. 
(Recommended) This matrix M is antisymmetric and also . Then all its 
eigenvalues are pure imaginary and they also have |A| = 1. (||Ma|| = ||x|| for every 
x so ||Ax|| = ||x|| for eigenvectors.) Find all four eigenvalues from the trace of M: 
0 1 1 1 
1 |-1 0-1 1 ; 
M = B 1 1 0 can only have eigenvalues 7 or — 7. 
Slt —1 1 0 


Show that this A (symmetric but complex) has only one line of eigenvectors: 
A= r a is not even diagonalizable: eigenvalues À = 0, 0. 


AT = A is not such a special property for complex matrices. The good property is 
ASA (Section 9.2). Then all \’s are real and the eigenvectors are orthogonal. 


Even if A is rectangular, the block matrix S = | — A is symmetric: 


= ; 0 Aliyi ly Le fo Az = Ay 
Si Axr is Lat nL Jea which is ive 
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19 
20 


21 


22 


23 


24 


25 


26 


(a) Show that —A is also an eigenvalue, with the eigenvector (y, — z). 
(b) Show that AT Az = \?z, so that \? is an eigenvalue of ATA. 


(c) If A = I (2 by 2) find all four eigenvalues and eigenvectors of S. 
If A= ee in Problem 18, find all three eigenvalues and eigenvectors of S. 


Another proof that eigenvectors are perpendicular when S = ST. Two steps: 


1. Suppose Sa = Aw and Sy = Oy and A Æ 0. Then y is in the nullspace 
and a is in the column space. They are perpendicular because . Go 
carefully—why are these subspaces orthogonal? 


2. If Sy = Gy, apply that argument to S — BI. One eigenvalue of S — BI moves 
to zero. The eigenvectors x, y stay the same—-so they are perpendicular. 


Find the eigenvector matrices Q for S and X for B. Show that X doesn’t collapse 
at d = 1, even though A = 1 is repeated. Are those eigenvectors perpendicular? 


0 d 0 —d 0 1 
S=|d 0 0 B=| 0 1 0 have A=1,d,—d. 
0 0 1 0 0d 


Write a 2 by 2 complex matrix with aS S (a “Hermitian matrix”). Find A, and A2 
for your complex matrix. Check that Et ay = Q (this is complex orthogonality). 


True (with reason) or false (with example). 
(a) A matrix with real eigenvalues and n real eigenvectors is symmetric. 
(b) A matrix with real eigenvalues and n orthonormal eigenvectors is symmetric. 
(c) The inverse of an invertible symmetric matrix is symmetric. 


(d) The eigenvector matrix Q of a symmetric matrix is symmetric. 


(A paradox for instructors) If AAT = ATA then A and AT share the same eigen- 
vectors (true). A and AT always share the same eigenvalues. Find the flaw in this 
conclusion: A and AT must have the same X and same A. Therefore A equals AT. 


(Recommended) Which of these classes of matrices do A and B belong to: 
Invertible, orthogonal, projection, permutation, diagonalizable, Markov? 


0 0 1 To ai 
A={0 1 0 B=-]|1 1 
1 0 0 1 1 


þe ao’ 


Which of these factorizations are possible for A and B: LU, QR, XAX +, QAQT? 


What number b in A = 2 p] makes A = QAQT possible? What number will make 
it impossible to diagonalize A? What number makes A singular? 


- 
; 
i 
i 
a 
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Find all 2 by 2 matrices that are orthogonal and also symmetric. Which two numbers 
can be eigenvalues of those two matrices? 


This A is nearly symmetric. But its eigenvectors are far from orthogonal: 


aah o Oe ell tse t ‘| and [2] 
=! 1410-15 as elgenvectors 0 an E 


What is the angle between the eigenvectors? 


(MATLAB) Take two symmetric matrices with different eigenvectors, say A = | 3 o] 


and B = |$ §]. Graph the eigenvalues \;(A+tB) and A2(A+tB) for —8 < t < 8. 
Peter Lax says on page 113 of Linear Algebra that A; and Az appear to be on a 
collision course at certain values of t. “Yet at the last minute they turn aside.” 
How close does A; come to Az ? 


Challenge Problems 


For complex matrices, the symmetry St = S that produces real eigenvalues must 
change in Section 9.2 to S* = S. From det(S — AI) = 0, find the eigenvalues of 
the 2 by 2 Hermitian matrix S = [4 2+i; 2—i 0]= 5. 


Normal matrices have N N = NV NV’. For real matrices, this is NTN = NNT. 
Normal includes symmetric, skew-symmetric, and orthogonal (with real A, imagi- 
nary À, and |A| = 1). Other normal matrices can have any complex eigenvalues. 


Key point: Normal matrices have n orthonormal eigenvectors. Those vectors x; 
probably will have complex components. In that complex case (Chapter 9) 
orthogonality means ale j = 0. Inner products (dot products) aly become x! y. 


The test for n orthonormal columns in Q becomes Fo = I instead of QTQ = I. 


N has n orthonormal eigenvectors (N = QAQ’) if and only if N is normal. 


(a) Start from N = QAQ’ with GO = I. Show that N N =NN" : N is normal. 


(b) Now start from N N = NN". Schur found A = QTQ” for every matrix A, 
with a triangular T. For normal matrices A = N we must show (in 3 steps) 
that this triangular matrix T' will actually be diagonal. Then T = A. 


Step 1. Put N = QTQ intoN' N=NN tofindT T=TT . 
a b 
0 d 
Step 3. Extend Step 2 to size n. Any normal triangular T must be diagonal. 


Step 2. Suppose T = | | has T T = TT". Prove that b = 0. 
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If Amax is the largest eigenvalue of a symmetric matrix S, no diagonal entry can be 
larger than Amax. What is the first entry a1; of S = QAQ™? Show why a11 < Amax: 


Suppose AT = —A (real antisymmetric matrix). Explain these facts about A: 


(a) TAx = 0 for every real vector x. 
(b) The eigenvalues of A are pure imaginary. 
(c) The determinant of A is positive or zero (not negative). 
For (a), multiply out an example of 2! Az and watch terms cancel. Or reverse 


x2™(Aax) to —(Ax)T x. For (b), Az = Az leads to ZT Az = AZ™z = Allz||?. Part (a) 
shows that ZT Az = (a — iy)? A(x + iy) has zero real part. Then (b) helps with (c). 


If S is symmetric and all its eigenvalues are A = 2, how do you know that S' must be 
21? Key point: Symmetry guarantees that S = QAQ?. What is that A? 


Which symmetric matrices S are also orthogonal? Then St = S~t. 


(a) Show how symmetry and orthogonality lead to S? = T. 

(b) What are the possible eigenvalues of this S'? 

(c) What are the possible eigenvalue matrices A ? Then S must be QAQ™ for those 
A and any orthogonal Q. 


If S is symmetric, show that AT S A is also symmetric (take the transpose of AT SA). 
Here A is m by n and S is m by m. Are eigenvalues of S = eigenvalues of A'S A? 


In case A is square and invertible, AT SA is called congruent to S. They have 


the same number of positive, negative, and zero eigenvalues: Law of Inertia. 
Here is a way to show that a is in between the eigenvalues A; and A2 of S: 


s=| 5 | det (S — AI) = X? — ad — cà + ac — b? 


b c is a parabola opening upwards (because of A?) 


Show that det (S — AI) is negative at A = a. So the parabola crosses the axis left 
and right of A = a. It crosses at the two eigenvalues of S so they must enclose a. 
A b | 


The n—1 eigenvalues of A always fall between the n eigenvalues of S = pT 
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6.5 Positive Definite Matrices 


1 Symmetric S: all eigenvalues > 0 © all pivots > 0 © all upper left determinants > 0. 
2 The matrix S is then positive definite. The energy test is £T Sa > 0 for all vectors x Æ 0. 
3 One more test for positive definiteness : S = AT A with independent columns in A. 


4 Positive semidefinite S allows \ = 0, pivot = 0, determinant = 0, energy xT Sa = 0. 


5 The equation x! Sx = 1 gives an ellipse in R” when S is symmetric positive definite. 


This section concentrates on symmetric matrices that have positive eigenvalues. If 
symmetry makes a matrix important, this extra property (all A > 0) makes it truly special. 
When we say special, we don’t mean rare. Symmetric matrices with positive eigenvalues 
are at the center of all kinds of applications. They are called positive definite. 

The first problem is to recognize positive definite matrices. You may say, just find all the 
eigenvalues and test A > 0. That is exactly what we want to avoid. Calculating eigenvalues 
is work. When the \’s are needed, we can compute them. But if we just want to know that 
all the ’s are positive, there are faster ways. Here are two goals of this section: 


e To find quick tests on a symmetric matrix that guarantee positive eigenvalues. 
e To explain important applications of positive definiteness. 
Every eigenvalue is real because the matrix is symmetric. 


a 


Start with 2 by 2. When does S = | b 


"| have A, > Oand `z > 0? 


Test: The eigenvalues of S are positive if and only if a > 0 and ac — b? > 0. 


S= f il is not positive definite because ac — b?=1-4<0 
1 -2]. oe 2 
57 = _59 g| 18 Positive definite because a = 1 and ac — bf = 6 — 4 > 0 


53 = E ; E is not positive definite (even with det A = +2) because a = —1 


The eigenvalues 3 and —1 of S, confirm that Sı is not positive definite. Positive trace 
3 — 1 = 2, but negative determinant (3)(—1) = —3. And S3 = —S%2 is negative definite. 
Two positive eigenvalues for S2, two negative eigenvalues for S3. 


Proof that the 2 by 2 test is passed when 1 > 0 and Az > 0. Their product A; A2 is the 
determinant so ac — b? > 0. Their sum à; + A2 is the trace so a + c > 0. Then a and care 
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both positive (if a or c is not positive, ac — b? > 0 will fail). Problem 1 reverses the 
reasoning to show that the tests a > 0 and ac > b? guarantee A; > 0 and A2 > 0. 

This test uses the 1 by 1 determinant a and the 2 by 2 determinant ac — b?. When S is 
3 by 3, det S > 0 is the third part of the test. The next test requires positive pivots. 


Test: The eigenvalues of S are positive if and only if the pivots are positive: 


ac — b? 


a> and a0) 


a 


a > 0 is required in both tests. So ac > b? is also required, for the determinant test and 
now the pivot test. The point is to recognize that ratio as the second pivot of S: 


———s b’ ac-b 
C= = 


F ' The first pivot is a E b | The second pivot is 
The multiplier is b/a ’ a d a 


This connects two big parts of linear algebra. Positive eigenvalues mean positive pivots 
and vice versa. Each pivot is a ratio of upper left determinants. The pivots give a quick test 
for A > 0, and they are a lot faster to compute than the eigenvalues. It is very satisfying to 
see pivots and determinants and eigenvalues come together in this course. 


yee ae | eigenvalues 1,1, 4 
3 by 3 example S= |1 2 1| is positive definite determinants 2 and 3 and 4 
O ae pivots 2 and 3/2 and 4/3 


S — I will be semidefinite: eigenvalues 0,0,3. S — 2] is indefinite because À = —1, —1, 2. 
Now comes a different way to look at symmetric matrices with positive eigenvalues. 


Energy-based Definition 


From Sx = àg, multiply by x? to get xT Sa = \x™ x. The right side is a positive À times 
a positive number x! x = ||æ||?. So the left side xT Sz is positive for any eigenvector. 


Important point: The new idea is that xT Sx is positive for all nonzero vectors z, 
not just the eigenvectors. In many applications this number æ" Sæ (or sa1 Sx) is the 
energy in the system. The requirement of positive energy gives another definition of a 
positive definite matrix. I think this energy-based definition is the fundamental one. 
Eigenvalues and pivots are two equivalent ways to test the new requirement x! Sx > 0. 
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tay | i 


= ar? + 2bry + cy? > 0. 


The four entries a, b, b, c give the four parts of xT Sæ. From a and c come the pure squares 
ax? and cy?. From b and b off the diagonal come the cross terms bry and byzx (the same). 
Adding those four parts gives xT Sæ. This energy-based definition leads to a basic fact: 


If S and T are symmetric positive definite, so is S + T. 


Reason: x1(S+T)z is simply x’ Sx2+a!T x. Those two terms are positive (for x 4 0) 
so S + T is also positive definite. The pivots and eigenvalues are not easy to follow when 
matrices are added, but the energies just add. 


x'Sa also connects with our final way to recognize a positive definite matrix. 
Start with any matrix A, possibly rectangular. We know that S = ATA is square and 
symmetric. More than that, S will be positive definite when A has independent columns: 


Test: If the columns of A are independent, then S = AT A is positive definite. 


Again eigenvalues and pivots are not easy. But the number x! Sz is the same as xT AT Ax. 
xl Al Ax is exactly (Axv)'(Ax) = ||Aa||?—another important proof by parenthesis! 
That vector Ax is not zero when x Æ O (this is the meaning of independent columns). 
Then x! Sx is the positive number || Az||? and the matrix S is positive definite. 

Let me collect this theory together, into five equivalent statements of positive definite- 
ness. You will see how that key idea connects the whole subject of linear algebra: pivots, 
determinants, eigenvalues, and least squares (from ATA). Then come the applications. 


1. All n pivots of S are positive. 
2. All n upper left determinants are positive. 
3. All n eigenvalues of S are positive. 


4. £" Sz is positive except at x = 0. This is the energy-based definition. — 


5. S equals ATA for a matrix A with independent columns. 


The “upper left determinants” are 1 by 1, 2 by 2,..., by n. The last one is the determinant 
of the complete matrix S. This theorem ties together the whole linear algebra course. 
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Example 1 Test these symmetric matrices © and T for positive definiteness: 


2 —1 0 2 —1 b 
S = |=1 2 = and T= j-i 2 -1 
0 -l 2 b —1 2 


Solution The pivots of S are 2 and i and $, all positive. Its upper left determinants are 2 
and 3 and 4, all positive. The eigenvalues of S are 2 — V2 and 2 and 2 + V2, all positive. 
That completes tests 1, 2, and 3. Any one test is decisive! 

I have three candidates A1, Az, A3 to suggest for S = ATA. They all show that S is 
positive definite. A, is a first difference matrix, 4 by 3, to produce —1, 2, —1 in S: 


2 E I =f Oo À b A 
S = ATA, A 2 at er Oa a À ied l 
0 =i 3 0 Do i] E Eg 


The three columns of A; are independent. Therefore S is positive definite. 
A» comes from S = LDL" (the symmetric version of S = LU). Elimination gives 


the pivots 2, 3 $ in D and the multipliers —i, 0, -2 in L. Just put Ag = Ly D. 


1 2 1-4 0 
2 
IDL ==) 4 Ss | Shy Dili DS Al Ae 
-2 1 i 1] Ao is the Cholesky factor of S 


This triangular choice of A has square roots (not so beautiful). It is the “Cholesky factor” 
of S and the MATLAB command is A = chol( S). In applications, the rectangular A, is 
how we build S and this Cholesky A, is how we break it apart. 


Eigenvalues give the symmetric choice Ag = QV AQT. This is also successful with 
Aft As = QAQT™ = S. All tests show that the —1, 2, —1 matrix S is positive definite. 


To see that the energy x? Sx is positive, we can write it as a sum of squares. The three 
choices A;, A2, A3 give three different ways to split up x? Sz: 


x’ Sax = Qa? — Qa x2 + 203 — 2xoxr3 + 223 Rewrite with squares 
||Arae||? = z? + (x2 — z) + (z3 — r) +2 Using differences in A; 


| Aga||? = 2(a1 — tal + $ (z2 — 245)" +323 Using S = LDLT 


|| Asx||? = A (g7 x)? + A2(q2 2)? + A3(qgx)? Using S = QAQ™ 
Now turn to T (top of this page). The (1, 3) and (3, 1) entries move away from 0 to b. 


This b must not be too large! The determinant test is easiest. The 1 by 1 determinant is 2, 
the 2 by 2 determinant T is still 3. The 3 by 3 determinant involves b: 


Test on T det T = 4+ 2b — 2b” = (1 +b)(4— 2b) must be positive. 


At b = —1 and b = 2 we get det T = 0. Between b = —1 and b = 2 this matrix T 
is positive definite. The corner entry b = 0 in the matrix S was safely between —1 and 2. 
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Positive Semidefinite Matrices 


Often we are at the edge of positive definiteness. The determinant is zero. The smallest 
eigenvalue is zero. The energy in its eigenvector is xT Sa = x'Ox = 0. These matrices 
on the edge are called positive semidefinite. Here are two examples (not invertible): 


19 2 =l. =i 
S= and T = | —1 2 —l1 | are positive semidefinite. 
A =l el 2 


S has eigenvalues 5 and 0. Its upper left determinants are 1 and 0. Its rank is only 1. This 
matrix S factors into AT A with dependent columns in A: 


Dependent columns in A 1 2} {1 OJJ 2) | ATA 
Positive semidefinite S 2 4| |2 O}; |O0 0l l 


If 4 is increased by any small number, the matrix S will become positive definite. 

The cyclic T also has zero determinant (computed above when b = —1). T is singular. 
The eigenvector x = (1,1, 1) has Tæ = 0 and energy x! Tx = 0. Vectors æ in all other 
directions do give positive energy. This T can be written as ATA in many ways, but A 
will always have dependent columns, with (1, 1, 1) in its nullspace: 


Second differences T 2 —1 —1 1-1 0 1 el 
from first differences A |—1 2 —1| = OE E —1 1 0 
Cyclic T from cyclic A —1-1 2 -1 0 1 0—1 1 


Positive semidefinite matrices have all A > 0 and all x! Sa > 0. Those weak inequalities 
(> instead of > ) include positive definite S and also the singular matrices at the edge. 
The Ellipse ax” + 2bxy + cy? = 1 


Think of a tilted ellipse xT Sa = 1. Its center is (0, 0), as in Figure 6.7a. Turn it to line up 
with the coordinate axes (X and Y axes). That is Figure 6.7b. These two pictures show the 
geometry behind the factorization S = QAQ7! = QAQ?: 


1. The tilted ellipse is associated with S. Its equation is x? Sa = 1. 
2. The lined-up ellipse is associated with A. Its equation is X TAX =1. 
3. The rotation matrix that lines up the ellipse is the eigenvector matrix Q. 


Example 2 Find the axes of this tilted ellipse 52? + 82y + 5y? = 1. 


Solution Start with the positive definite matrix that matches this equation: 


The equation is [z y] L l H ZE The matrix is S = i T 
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Figure 6.7: The tilted ellipse 5x? + 82y + 5y? = 1. Lined up it is 9X? + Y? = 1. 


The eigenvectors are Ed and W iP Divide by \/2 for unit vectors. Then S = QAQ?: 


Eigenvectors in Q f 5| ee! i i i | 1 i 1 


Eigenvalues 9 and 1 4 5 V2 11) hd V2 i =I 


Now multiply by [x y] on the left and Bl on the right to get xT Sx = (2™Q)A(Q™ 2): 


2 2 
T 2 2 Cry =y 
x Sx = sum of squares 52° + 8ry + 5y* = 9 | ——) +1(——]. 2 
äi P (=) (E) E 


The coefficients are not the pivots 5 and 9/5 from D, they are the eigenvalues 9 and 1 
from A. Inside the squares are the eigenvectors q, = (1, 1)/ v2 and qs = (1, —1)/ V2. 


The axes of the tilted ellipse point along those eigenvectors. This explains why 
S = QAQ" is called the “principal axis theorem”—it displays the axes. Not only the 
axis directions (from the eigenvectors) but also the axis lengths (from the eigenvalues). 
To see it all, use capital letters for the new coordinates that line up the ellipse: 


a and ey and 9X24 Y? =1. 


2 V2 


The largest value of X? is 1/9. The endpoint of the shorter axis has X = 1/3 and Y = 0. 
Notice: The bigger eigenvalue A; gives the shorter axis, of half-length 1//A, = 1/3. 
The smaller eigenvalue A2 = 1 gives the greater length 1/\/A2 = 1. 


Lined up 


In the xy system, the axes are along the eigenvectors of S. In the XY system, the 
axes are along the eigenvectors of A—the coordinate axes. All comes from S = QAQ?. 
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S=QAQ is positive definite when all A; >0. The graph of xT Sæ = 1 is an ellipse: 


Ellipse [£ y] QAQ™ H = (x YA E = MX? +Y? =1.| (3) 


The axes point along eigenvectors of S. The half-lengths are 1/ vÀ; and 1/ v2. 


S = I gives the circle x? + y? = 1. If one eigenvalue is negative (exchange 4’s and 5’s 
in S), the ellipse changes to a hyperbola. The sum of squares becomes a difference of 
squares. 9X? — Y? = 1. For a negative definite matrix like S = —J, with both ’s 
negative, the graph of —x? — y? = 1 has no points at all. 


If S isn by n, x? Sa = 1 is an “ellipsoid” in R”. Its axes are the eigenvectors of S. 


Important Application: Test for a Minimum 
Does F(x,y) have a minimum if 0F/ðx = 0 and OF /Oy = Oat the point (x, y) = (0,0)? 


For f (x), the test for a minimum comes from calculus: df /dz is zero and d? f /dx? > 0. 
Two variables in F(x,y) produce a symmetric matrix S. It contains four second deriva- 
tives. Positive d? f / dx? changes to positive definite S: 


Second -|o For O° F/dxdy 
derivatives ~ | OF F/dydx OP F/Oy? 


F(x,y) has a minimum if OF /Ox = OF /Oy = Oand S is positive definite. 


Reason: S reveals the all-important terms az? + 2bry + cy? near (x,y) = (0,0). 
The second derivatives of F are 2a, 2b, 2b, 2c. For F(x,y, z) the matrix S will be 3 by 3. 


= REVIEW OF THE KEY IDEAS =" 


1. Positive definite matrices have positive eigenvalues and positive pivots. 
2. A quick test is given by the upper left determinants: a > 0 and ac — b? > 0. 
3. The graph of the energy xT Sz is then a “bowl” going up from x = 0: 
a’ Sa = ax? + 2bry + cy? is positive except at (x,y) = (0,0). 
4. S = A’ A is automatically positive definite if A has independent columns. 
5. The ellipsoid x? Sx = 1 has its axes along the eigenvectors of S. Lengths 1/ Vd. 


F oF 
6. Minimum of F(x,y) if 2i = a =Q and 2nd derivative matrix is positive definite. 
Yy 


Ox 
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= WORKED EXAMPLES = 


6.5 A The great factorizations of a symmetric matrix are S = LDLT from pivots and 
multipliers, and S = QAQT from eigenvalues and eigenvectors. Try these n by n tests 
on pascal(6) and ones(6) and hilb(6) and other matrices in MATLAB’s gallery. 


pascal(6) is positive definite because all its pivots are 1 (Worked Example 2.6 A). 
ones(6) is positive semidefinite because its eigenvalues are 0,0, 0,0, 0,6. 

H=hilb(6) is positive definite even though eig(H) shows eigenvalues very near zero. 
Hilbert matrix xT Ha = ha + aos +: + z685)? ds > 0, Hi; =1/(i+j-—1). 
rand(6) +rand(6)’ can be positive definite or not. Experiments gave only 2 in 20000. 

n = 20000; p = 0; for k = 1:n, A = rand(6); p = p + all(eig( A + A’) > 0); end, p/n 
A 
pT 


Solution Multiply the first row of M by BTA-! and subtract from the second row, to 
get a block of zeros. The Schur complement S = C — BT A~'B appears in the corner: 


I 0 A B)_/ A B E ae: 
=B A” 1 || 8" C|] | 6 C-ra ge || 0 d A 


Those two blocks A and S must be positive definite. Their pivots are the pivots of M. 


6.5B When is the symmetric block matrix M = | 7 | positive definite? 


6.5C Find the eigenvalues of the —1, 2, —1 tridiagonal n by n matrix S (my favorite). 


Solution The best way is to guess À and x. Then check Sx = Aa. Guessing could not 
work for most matrices, but special cases are a big part of mathematics (pure and applied). 

The key is hidden in a differential equation. The second difference matrix S is like a 
second derivative, and those eigenvalues are much easier to see: 


Eigenvalues A1, A2,... dy = m y(0)=0 
=> =z) with 7) 9 (5) 
Eigenfunctions y1, y2,... dx y(1) = 
Try y = sin cz. Its second derivative is y” = —c? sin cx. So the eigenvalue in (5) will be 


A = —c’, provided y = sin cz satisfies the end point conditions y(0) = 0 = y(1). 

Certainly sin0 = 0 (this is where cosines are eliminated). At the other end z = 1, 
we need y(1) = sinc = 0. The number c must be kr, a multiple of m. Then A is —k?7?: 

Eigenvalues \ = —k?7? d2 
—— sin krz = —k?r? sin krz. (6) 
Eigenfunctions y = sin kre dx? 

Now we go back to the matrix S and guess its eigenvectors. They come from sin krg 
at n points x = h,2h,...,nh, equally spaced between 0 and 1. The spacing Az is h = 
1/(n +1), so the (n + 1)st point has (n + 1)h = 1. Multiply that sine vector æ by S: 

Eigenvalue of S is positive Sa = Apu = (2 —2coskrh) æ 


(7) 


Eigenvector of S is sine vector E = (sinkwh,...,sinnknh). 
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Problem Set 6.5 


Problems 1-13 are about tests for positive definiteness. 


1 Suppose the 2 by 2 tests a > 0 and ac — b? > 0 are passed. Then c > b?/a > 0. 


(i) Ay and Az have the same sign because their product A; Az equals 


(i) That sign is positive because A; + Az equals 


Conclusion: The tests a > 0,ac — b? > 0 guarantee positive eigenvalues à, A2. 


2 Which of S1, S2, S3, S4 has two positive eigenvalues? Use a test, don’t compute the 
A’s. Also find an x so that xT S1æ < 0, so Sı is not positive definite. 


5 6 -1 -2 _f 1 10 _f 1. 10 
s= |e | =| E s= li 10o, s= li ra 


3 For which numbers b and c are these matrices positive definite? 


so) sf of 4 


With the pivots in D and multiplier in L, factor each A into LDL". 


4 What is the function f = ax? + 2bxry + cy? for each of these matrices? Complete 
the square to write each f as a sum of one or two squares f = di( )? + do(_)?. 


Ba) Ba) cesfells 


5 Write f(x,y) = z? + 4ry + 3y? as a difference of squares and find a point (x,y) 
where f is negative. No minimum at (0,0) even though f has positive coefficients. 


S 


p 


6 The function f(x, y) = 2xy certainly has a saddle point and not a minimum at (0, 0). 
What symmetric matrix S produces this f? What are its eigenvalues? 


7 Test to see if ATA is positive definite in each case: A needs independent columns. 
i 1 
1 2 1 12 
A=) J and A= z and a=hi 2 ar 


8 The function f(x,y) = 3(x + 2y)? + 4y? is positive except at (0,0). What is the 
matrix in f = [æ y]S[a y]’? Check that the pivots of A are 3 and 4. 
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9 Find the 3 by 3 matrix S and its pivots, rank, eigenvalues, and determinant: 
T1 
| £1 T2 z3 | S T2 = 4(£ı — T2 + 223)”. 
T3 


10 Which 3 by 3 symmetric matrices S and T produce these quadratics? 
at Sa = 2(x? + x3 +23 — T12 — £2%3). Why is S positive definite? 
atl a = 2(x? + 13 + 13 — £1£2 — 2123 — £213). Why is T semidefinite? 


11 Compute the three upper left determinants of S to establish positive definiteness. 
Verify that their ratios give the second and third pivots. 


2 2 0 
Pivots = ratios of determinants Sa E 
0 3 8 


12 For what numbers c and d are S and T positive definite? Test their 3 determinants: 


Gx 


ERO 
O Re 


1 1 2 3 
Cc and i= |2 d A 
1 3. 45 


13 Find a matrix with a > 0 and c > 0 and a + c > 26 that has a negative eigenvalue. 


Problems 14-20 are about applications of the tests. 


14 If S is positive definite then S~' is positive definite. Best proof: The eigenvalues 
of ST} are positive because . Second proof (only for 2 by 2): 


1 


The entries of S~! = — 
ac — b? 


c —b l 
b al Pass the determinant tests 


15 IfS andT are positive definite, their sum S + T is positive definite. Pivots and 
eigenvalues are not convenient for S + T. Better to use xT(S + T)æ > 0. Also 
S = ATA and T = BTB give S +T = [a B]" [8] with independent columns. 


16 A positive definite matrix cannot have a zero (or even worse, a negative number) 
on its main diagonal. Show that this matrix fails to have xT Sx > 0: 


4 1 | 21 | 
[xı x2 x3] |1 0 2] | 22] isnot positive when (x1,22,23)=( , , ). 
1 2 5] [z3] 


17 A diagonal entry s;; of a symmetric matrix cannot be smaller than all the A’s. If it 
were, then S — sj; I would have eigenvalues and would be positive definite. 
But S — s;;[ has a on the main diagonal. 
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18 IfSr=Axthenx'Sx = ___. Why is this number positive when \ > 0? 


19 Reverse Problem 18 to show that if all A > 0 then xT Sæ > 0. We must do this 
for every nonzero x, not just the eigenvectors. So write x as a combination of the 
eigenvectors and explain why all “cross terms” are x} x j = 0. Then x! Sa is 


(cya, +: -+¢n&n) (C4121 +-+-+¢nAntn) = hat ay + +0 Ani En > 0. 


20 Give a quick reason why each of these statements is true: 


(a) Every positive definite matrix is invertible. 
(b) The only positive definite projection matrix is P = J. 
(c) A diagonal matrix with positive diagonal entries is positive definite. 


(d) A symmetric matrix with a positive determinant might not be positive definite! 
Problems 21-24 use the eigenvalues; Problems 25-27 are based on pivots. 


21 For which s andt do S and T have all A > 0 (therefore positive definite)? 


s —4 —4 t 33 0 
S= |—4 s —4 and T=/|3 t 4 
—4 -4 s 0 4 t 


22 From S = QAQ™ compute the positive definite symmetric square root QVAQT 
of each matrix. Check that this square root gives AT A = S: 


5 4 10 6 
s | and s=|"5 ol 


23 You may have seen the equation for an ellipse as x7/a? + y?/b? = 1. What are a 
and b when the equation is written àx? + Agy? = 1? The ellipse 9x? + 4y? = 1 
has axes with half-lengths a = and b = 


24 Draw the tilted ellipse x? + xy + y? = 1 and find the half-lengths of its axes from 
the eigenvalues of the corresponding matrix S. 


25 With positive pivots in D, the factorization S = LDLT becomes LYDV DLT. 
(Square roots of the pivots give D = V/DVD.) Then C = VDLT yields the 
Cholesky factorization A = CTC which is “symmetrized L U”: 


3 1 4 8 


From c= [5 9 8 OF 


| find S. From S= | | find C = chol(S). 


26 In the Cholesky factorization S = CTC, with C = VDL", the square roots of the 
pivots are on the diagonal of C. Find C (upper triangular) for 


1 
2 
2 


RESES 
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27 


28 


29 


30 


31 


The symmetric factorization S = LDLT means that xT Sa = «x'LDL' 2: 


[e y] ; "| =[z y] tere i i he Piel i a Ch 


The left side is az? + 2bry + cy?. The right side is a(x + by)? fe y. 
The second pivot completes the square! Test with a = 2,b = 4,c = 10. 


ones cos@ —sin||2 O|] cos@ siné 
Without multiplying S = a 9 a j f 4 È a: A , find 
(a) the determinant of S (b) the eigenvalues of S 
(c) the eigenvectors of S (d) areason why S is symmetric positive definite. 


For Fy (x,y) = 42*+a°y+y? and Fy(z, y) = z3’ +xy— c find the second derivative 
matrices S and So: 


O-F/Ox? ~0°?F /Axdy 


Test for minimum S = 
OF /OyOx O7F/dy? 


| is positive definite 
S; is positive definite so F; is concave up (= convex). Find the minimum point of F}. 
Find the saddle point of Fo (look only where first derivatives are zero). 


The graph of z = x? + y? is a bowl opening upward. The graph of z = x? — y? is a 
saddle. The graph of z = —x? — y? is a bowl opening downward. What is a test on 
a,b, c for z = ax? + 2bxry + cy” to havea saddle point at (x, y) = (0,0)? 


Which values of c give a bowl and which c give a saddle point for the graph of 
z = 4z? + 12zy + cy”? Describe this graph at the borderline value of c. 


The Minimum of a Function F(x, y, z) 


What tests would you expect for a minimum point? First come zero slopes: 
OF OF OF 


First derivatives are zero -— = —— = —— = Oat the minimum point. 
Ox Oy Oz 
Next comes the linear algebra version of the usual calculus test d? f /dz? > 0: 
Frs Fry Fez 
Second derivative matrix S is positive definite S = | Fy, Fyy Fyz 


ð (OF ð (OF ae 
Here Fay = An (F) = Ja ($) = Fyz is a ‘mixed” second derivative. 
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Challenge Problems 


32 A group of nonsingular matrices includes AB and A`! if it includes A and B. 
“Products and inverses stay in the group.” Which of these are groups (as in 2.7.37)? 
Invent a “subgroup” of two of these groups (not I by itself = the smallest group). 


(a) Positive definite symmetric matrices S. 
(b) Orthogonal matrices Q. 


(c) All exponentials et^ of a fixed matrix A. 


(d) Matrices P with positive eigenvalues. 
(e) Matrices D with determinant 1. 
33 When S and T are symmetric positive definite, ST’ might not even be symmetric. 


But its eigenvalues are still positive. Start from ST’ = Ax and take dot products 
with Tx. Then prove A > 0. 


34 Write down the 5 by 5 sine matrix Q from Worked Example 6.5 C, containing the 
eigenvectors of S when n = 5 and h = 1/6. Multiply SQ to see the five A’s. 
The sum of \’s should equal the trace 10. Their product should be det S' = 6. 


35 Suppose C is positive definite (so y?C'y > 0 whenever y # 0) and A has indepen- i 
dent columns (so Ax # O whenever x # 0). Apply the energy test to x? ATC Ax 
to show that S = ATCA is positive definite: the crucial matrix in engineering. 

36 Important! Suppose S is positive definite with eigenvalues à > A2 >... > An. 
(a) What are the eigenvalues of the matrix A; J — S? Is it positive semidefinite? 

(b) How does it follow that \;2T a > xT Sx for every x? | 
(c) Draw this conclusion: The maximum value of x? Sz/ax" z is : 


37 For which a and c is this matrix positive definite ? For which a and c is it positive | 
semidefinite (this includes definite) ? 


a a a All 5 tests are possible. | 


S=]a a+c a-c The energy xT Sæ equals 


a a—c a+c a (£1 + z2 +23)? +c¢(xq — 23). 
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Table of Eigenvalues and Eigenvectors 


How are the properties of a matrix reflected in its eigenvalues and eigenvectors? 
This question is fundamental throughout Chapter 6. A table that organizes the key facts may 
be helpful. Here are the special properties of the eigenvalues A; and the eigenvectors x;. 


Symmetric: ST = S = QAQT real eigenvalues orthogonal x} æ; = 0 
Orthogonal: QT = Q7! all |A| =1 orthogonal z7 x; = 0 
Skew-symmetric: AT = — A imaginary \’s orthogonal z7 x; = 0 
Complex Hermitian: S26 real ’s orthogonal Z7 x; = 0 
Positive Definite: rT Sx > 0 allà 0 orthogonal since ST = S 
Markov: mi; > 0,0), mij =1 eal steady state x > 0 
Similar: A = BC B7! A(A) = MC) B times eigenvector of C 
Projection: P = P? = PT A=1; 0 column space; nullspace 
Plane Rotation : cosine-sine e” ande ™™ x = (1,1) and (1, —i) 
Reflection: J — 2uu™ AS =) inl u; whole plane wt 
Rank One: uv! A = vtu; 0,..,0 u; whole plane vt 
Inverse: A~! 1/A(A) keep eigenvectors of A 
Shift: A + cl A(A) +c keep eigenvectors of A 
Stable Powers: A” — 0 all |A| < 1 any eigenvectors 
Stable Exponential: e^t — 0 all Re \ < 0 any eigenvectors 
Cyclic Permutation: P; ¿+1 =1,Pni=1 Ak = e2tk/n — roots of 1 = OW ApS y 
Circulant: col + cP +--+- e e a sss eS Aap) 
Tridiagonal: —1,2,—1 on diagonals Ap = 2 — 2 cos an Ék = (sin At sin ain, X J 
Diagonalizable: A = XAX~! diagonal of A columns of X are independent 
Schur: A = QTQ! diagonal of triangular T columns of Q if ATA = AAT 
Jordan: A = BJB! diagonal of J each block gives 1 eigenvector 


SVD: A = UŁXVT r singular values in © eigenvectors of AT A, AAT in V,U 


Chapter 7 


The Singular Value Decomposition (SVD) 


7.1 Image Processing by Linear Algebra 


1 An image is a large matrix of grayscale values, one for each pixel and color. 
2 When nearby pixels are correlated (not random) the image can be compressed. 
3 The SVD separates any matrix A into rank one pieces uv! = (column)(row). 


4 The columns and rows are eigenvectors of symmetric matrices AAT and ATA. 


The singular value theorem for A is the eigenvalue theorem for AT A and AAT. 


That is a quick preview of what you will see in this chapter. A has two sets of singular 
vectors (the eigenvectors of ATA and AAT). There is one set of positive singular values 
(because ATA has the same positive eigenvalues as AAT). A is often rectangular, but 
ATA and AA? are square, symmetric, and positive semidefinite. 


The Singular Value Decomposition (SVD) separates any matrix into simple pieces. 


Each piece is a column vector times a row vector. An m by n matrix has m times n en- 
tries (a big number when the matrix represents an image). But a column and a row only 
have m + n components, far less than m times n. Those (column)(row) pieces are full 
size matrices that can be processed with extreme speed—they need only m plus n numbers. 


Unusually, this image processing application of the SVD is coming before the ma- 
trix algebra it depends on. I will start with simple images that only involve one or two 
pieces. Right now I am thinking of an image as a large rectangular matrix. The entries a;; 
tell the grayscales of all the pixels in the image. Think of a pixel as a small square, 7 steps 
across and 7 steps up from the lower left corner. Its grayscale is a number (often a whole 
number in the range 0 < a;; < 256 = 28). An all-white pixel has a;; = 255 = 11111111. 
That number has eight 1’s when the computer writes 255 in binary notation. 
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You see how an image that has m times n pixels, with each pixel using 8 bits (0 or 1) 
for its grayscale, becomes an m by n matrix with 256 possible values for each entry aij. 

In short, an image is a large matrix. To copy it perfectly, we need 8 (m)(n) bits of 
information. High definition television typically has m = 1080 and n = 1920. Often 
there are 24 frames each second and you probably like to watch in color (3 color scales). 
This requires transmitting (3)(8)(48, 470, 400) bits per second. That is too expensive and 
it is not done. The transmitter can’t keep up with the show. 

When compression is well done, you can’t see the difference from the original. 
Edges in the image (sudden changes in the grayscale) are the hard parts to compress. 

Major success in compression will be impossible if every a;; is an independent random 
number. We totally depend on the fact that nearby pixels generally have similar grayscales. 
An edge produces a sudden jump when you cross over it. Cartoons are more compressible 
than real-world images, with edges everywhere. 

For a video, the numbers a;; don’t change much between frames. We only transmit 
the small changes. This is difference coding in the H.264 video compression standard (on 
this book’s website). We compress each change matrix by linear algebra (and by nonlinear 
“quantization” for an efficient step to integers in the computer). 

The natural images that we see every day are absolutely ready and open for 
compression—but that doesn’t make it easy to do. 


Low Rank Images (Examples) 


The easiest images to compress are all black or all white or all a constant grayscale g. 
The matrix A has the same number g in every entry: aj; = g. When g = land m = n = 6, 
here is an extreme example of the central SVD dogma of image processing : 


Example 1 Don’t send A = Send this A = pee See 


ae 
ae 
eee Re Ee 
a 
aa 
a 
Bee eRe ee 


36 numbers become 12 numbers. With 300 by 300 pixels, 90, 000 numbers become 600. 
And if we define the all-ones vector x in advance, we only have to send one number. 
That number would be the constant grayscale g that multiplies xæT to produce the matrix. 


Of course this first example is extreme. But it makes an important point. If there 
are special vectors like x = ones that can usefully be defined in advance, then image 
processing can be extremely fast. The battle is between preselected bases (the Fourier 
basis allows speed-up from the FFT) and adaptive bases determined by the image. The 
SVD produces bases from the image itself—this is adaptive and it can be expensive. 

I am not saying that the SVD always or usually gives the most effective algorithm in 
practice. The purpose of these next examples is instruction and not production. 
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Example 2 

“ace flag” 

French flag A Don’t send A = 
Italian flag A 

German flag AT 


Send A = 


[aaccee| 


geeaeaee ses 
esegeaeeaeeasa 
OG Os (So OS 
Oe Ge OSs (Os 1 
Dn D0 D DO MO OO 
OD DM MW MW O 
eee ee Ye 


This flag has 3 colors but it still has rank 1. We still have one column times one row. 
The 36 entries could even be all different, provided they keep that rank 1 pattern A = u,v}. 
But when the rank moves up to r = 2, we need u,v} + u2 ve. Here is one choice: 


Example 3 | oe aie jl 
Embedded square a=| 31] isequate a=[ 7 |111- ion] 


The 1’s and the 0 in A could be blocks of 1’s and a block of 0’s. We would still 
have rank 2. We would still only need two terms u,v} and uzvf. A 6 by 6 image 
would be compressed into 24 numbers. An N by N image (N? numbers) would be 


compressed into 4N numbers from the four vectors U1, V1, U2, V2. 


Have I made the best choice for the w’s and v’s ? This is not the choice from the SVD! 
Inotice that w; = (1, 1) is not orthogonal to ug = (1,0). And vı = (1, 1) is not orthogonal 
to v2 = (0,1). The theory says that orthogonality will produce a smaller second piece 
C2U2 ve. (The SVD chooses rank one pieces in order of importance.) 


If the rank of A is much higher than 2, as we expect for real images, then A will 
add up many rank one pieces. We want the small ones to be really small—they can be 
discarded with no loss to visual quality. Image compression becomes lossy, but good 
image compression is virtually undetectable by the human visual system. 

The question becomes: What are the orthogonal choices from the SVD? 


Eigenvectors for the SVD 


I want to introduce the use of eigenvectors. But the eigenvectors of most images are not 
orthogonal. Furthermore the eigenvectors 71, £2 give only one set of vectors, and we want 
two sets (w’s and v’s). The answer to both of those difficulties is the SVD idea: 
Use the eigenvectors u of AAT and the eigenvectors v of AT A. 

Since AAT and AT A are automatically symmetric (but not usually equal!) the w’s will be 
one orthogonal set and the eigenvectors v will be another orthogonal set. We can and will 
make them all unit vectors: ||w;|| = 1 and ||v;|| = 1. Then our rank 2 matrix will be 
AÅ = ouw vT -+ O2U2V4. The size of those numbers a; and o2 will decide whether they 
can be ignored in compression. We keep larger a’s, we discard small o’s. 
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The w’s from the SVD are called left singular vectors (unit eigenvectors of AAT), 
The v’s are right singular vectors (unit eigenvectors of ATA). The o’s are singular 
values, square roots of the equal eigenvalues of AAT and ATA: 


Choices from the SVD AAtu,; = T ui AT Av; = o vi Av; = Oiu; (1) 


In Example 3 (the embedded square), here are the symmetric matrices AAT and ATA: 


wepip eEG 


Their determinants are 1, so A, À2 = 1. Their traces (diagonal sums) are 3: 


ae, E nga (eee or o 34+Vv5 _3-Vv5 
det | 1 yy [=> —3A+1=0 gives 4, = 5 and Ag = aa 
1 e 
The square roots of A; and A2 are a, = => and gg = v5 with c1 og = 1. 


The nearest rank 1 matrix to A will be ouv]. The error is only o2 ~ 0.6 = best possible. 


The orthonormal eigenvectors of AAT and ATA are 


m= 2] w= | 7 a v=] i all divided by 4/1 + 0?. (2) 


Every reader understands that in real life those calculations are done by computers! 
(Certainly not by unreliable professors. I corrected myself using svd (A) in MATLAB.) 
And we can check that the matrix A is correctly recovered from 01,0} + ozugv? : 


‘= or more simply A [o 3 = o U1 02 ua, D 


Important The key point is not that images tend to have low rank. No: Images mostly 
have full rank. But they do have low effective rank. This means: Many singular values 
are small and can be set to zero. We transmit a low rank approximation. 


Example 4 Suppose the flag has two triangles of different colors. The lower left triangle 
has 1’s and the upper right triangle has 0’s. The main diagonal is included with the 1’s. 
Here is the image matrix when n = 4. It has full rank r = 4 so it is invertible: 


1 0 0 0 1 0 00 
Triangular 1110 0 4 | i rs 
flag matrix A 1 11 0 am as 0 =) 1 O 
1 1 1 1 0 0 L'i 
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With full rank, A has a full set of n singular values o (all positive). The SVD will 
produce n pieces c; u; v? of rank one. Perfect reproduction needs all n pieces. 

In compression small o’s can be discarded with no serious loss in image quality. We 
want to understand and plot the o’s for n = 4 and also for large n. Notice that Example 3 
was the special case n = 2 of this triangular Example 4. 

Working by hand, we begin with AAT (a computer would proceed differently) : 


2 -l1 0 0 

-1 2 eal 0 
0 -1 2 -lj 4) 
0 0 -l 1 


AAT = and (AAT)! = (A71)TA7) = 


1 1 
2 2 
3 3 
3 4 


=. eee 
N NN e 


That —1,2,—1 inverse matrix is included because its eigenvalues all have the form 
2 — 2 cos 0. So we know the \’s for AAT and the o’s for A: 


1 1 1 
2—2cos0 4sin2(0/2) ES F và 2 sin(0/2) ©) 


The n different angles 0 are equally spaced, which makes this example so exceptional : 


À 


T 3r (2n — 1)r ; 3m . 6 
mel’ Mp1? Onl (> 4 includes 9 wl 2sin 5 1) 


That special case gives \ = 1 as an eigenvalue of AA? when n = 4. Soo = VA=1 
is a singular value of A. You can check that the vector u = (1,1,0,—1) has AATu = u 
(a truly special case). 

The important point is to graph the n singular values of A. Those numbers drop off 
(unlike the eigenvalues of A, which are all 1). But the dropoff is not steep. So the SVD 
gives only moderate compression of this triangular flag. Great compression for Hilbert. 


107+ za a — 10° T — 


A an Singular values of hilb( 40 ) 
l | 10°}%e 
| e 
ar 
10! F @ | ‘ 
à | 10°F oe 
=. 
°, i 
0 ®e a ‘ 
= 
10 | °° So 00000, l $ 
$000 00cecccc0ees 15 š 
1072F 
ree oe Coc cccccccces, 
e 
1 -1 L ae aa a i i EE l 
: | z e = T 10 4 10 20 30 40 


Figure 7.1: Singular values of the triangle of 1’s in Examples 3-4 (not compressible) and 
the evil Hilbert matrix H(i, j) = (i+ j —1)7! in Section 8.3 : compress it to work with it. 
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Your faithful author has continued research on the ranks of flags. Quite a few are based on 
horizontal or vertical stripes. Those have rank one—all rows or all columns are multiples 
of the ones vector (1,1,...,1). Armenia, Austria, Belgium, Bulgaria, Chad, Colombia, 
Ireland, Madagascar, Mali, Netherlands, Nigeria, Romania, Russia (and more) have three 
stripes. Indonesia and Poland have two! Libya was the extreme case in the Gadaffi years 
1977 to 2011 (the whole flag was green). 

At the other extreme, many flags include diagonal lines. Those could be long diagonals 
as in the British flag. Or they could be short diagonals coming from the edges of a star— 
as in the US flag. The text example of a triangle of ones shows how those flag matrices 
will have large rank. The rank increases to infinity as the pixel sizes get small. 

Other flags have circles or crescents or various curved shapes. Their ranks are large and 
also increasing to infinity. These are still compressible! The compressed image won’t be 
perfect but our eyes won’t see the difference (with enough terms o;u;v}; from the SVD). 
Those examples actually bring out the main purpose of image compression: 

Visual quality can be preserved even with a big reduction in the rank. 

For fun I looked back at the flags with finite rank. They can have stripes and they can 
also have crosses—provided the edges of the cross are horizontal or vertical. Some flags 
have a thin outline around the cross. This artistic touch will increase the rank. Right now 
my champion is the flag of Greece shown below, with a cross and also stripes. Its rank 
is three by my counting (three different columns). I see no US State Flags of finite rank ! 

The reader could google “national flags” to see the variety of designs and colors. I 
would be glad to know any finite rank examples with rank > 3. Good examples of all kinds 
will go on the book’s website math.mit.edu/linearalgebra (and flags in full color). 
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Problem Set 7.1 


1 What are the ranks r for these matrices with entries 2 times j and i plus j ? Write A 
and B as the sum of r pieces uv?” of rank one. Not requiring Ut U2 = Viv =0. 


ome ae 93 45 
24 6 8 3 4 5 6 
ON 8G gy. BE eee oe, 
4 8 12 16 an arama: 


2 We usually think that the identity matrix J is as simple as possible. But why is I 
completely incompressible? Draw a rank 5 flag with a cross. 


3 These flags have rank 2. Write A and B in any way as u,v; + ugv?. 
te ae ee 122 
Asweden = AfFinland = 2 2 2 2 BBenin = i = 3 
i eae oe 


4 Now find the trace and determinant of BBT and BTB in 
Problem 3. The singular values of B are close to o? = 28 — 4 and oł = +. 
Is B compressible or not? 


5 Use [U, S, V] = svd (A) to find two orthogonal pieces ouv of Asweden- 


6 Find the eigenvalues and the singular values of this 2 by 2 matrix A. 
an eee | f T4 | 20 10 r_I ð 10 
a=; >| with ata =] io | and a= A 


The eigenvectors (1,2) and (1, —2) of A are not orthogonal. How do you know the 
eigenvectors v1, V2 of ATA are orthogonal? Notice that ATA and AAT have the 
same eigenvalues (25 and 0). 


7 How does the second form AV = U® in equation (3) follow from the first form 
A = UXV'T ? That is the most famous form of the SVD. 


8 The two columns of AV = US are Avı = ciu; and Av2 = o2U2. So we hope that 
E 1 0 O1 = 1 1 0 1 — 01 
am=[i lajala] e G ill -arela ] 
The first needs cı + 1 = o? and the second needs 1 — c1 = —o9. Are those true? 


9 The MATLAB commands A = rand (20, 40) and B = randn (20, 40) produce 20 by 
40 random matrices. The entries of A are between 0 and 1 with uniform probability. 
The entries of B have a normal “bell-shaped” probability distribution. Using an svd 
command, find and graph their singular values c1 to c20. Why do they have 20 a’s ? 
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7.2 Bases and Matrices in the SVD 


The SVD produces orthonormal basis of v’s and w’s for the four fundamental subspaces. 


2 Using those bases, A becomes a diagonal matrix © and Av; = oiu; : o; = singular value. 
3 The two-bases diagonalization A = USV T often has more information than A = XA X71. 
4 


USVe separates A into rank-1 matrices 0; U1 v} +e OrurvT. ouv? is the largest! 


The Singular Value Decomposition is a highlight of linear algebra. A is any m by n 
matrix, square or rectangular. Its rank is r. We will diagonalize this A, but not by X~1AX. 
The eigenvectors in X have three big problems: They are usually not orthogonal, there are 
not always enough eigenvectors, and Ax = Ax requires A to be a square matrix. The 
singular vectors of A solve all those problems in a perfect way. 

Let me describe what we want from the SVD: the right bases for the four subspaces. 
Then I will write about the steps to find those basis vectors in order of importance. 


The price we pay is to have two sets of singular vectors, u’s and v’s. The w’s are in 
R” and the v’s are in R”. They will be the columns of an m by m matrix U and an n by 
n matrix V. I will first describe the SVD in terms of those basis vectors. Then I can also 
describe the SVD in terms of the orthogonal matrices U and V. 


(using vectors) The w’s and v’s give bases for the four fundamental subspaces : 


Ui 5455 Ar is an orthonormal basis for the column space 
Ur+1,.---, Um is an orthonormal basis for the left nullspace N (AT) 
Vies Ur is an orthonormal basis for the row space 
Ur+1;---;Un is an orthonormal basis for the nullspace N (A). 


More than just orthogonality, these basis vectors diagonalize the matrix A: 


“ A is diagonalized”’ Av, =o U1 Avg =00U2 |... Av, = Orur WO) 


Those singular values cı to o, will be positive numbers: c; is the length of Avi. 
The o’s go into a diagonal matrix that is otherwise zero. That matrix is X. 


(using matrices) Since the w’s are orthonormal, the matrix U, with those r columns 
has U! U, = I. Since the v’s are orthonormal, the matrix V, has aA = Í. Then the 
equations Av; = gu; tell us column by column that AV, = U,»,: 


(m by n)(n by r) O1 
AY,.= UE A | Uy --+Up | =| Uy + -U, oa a 2 
(m by r)(r byr) Or 
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This is the heart of the SVD, but there is more. Those v’s and w’s account for the row 
space and column space of A. We have n — r more v’s and m — r more u’s, from the 
nullspace N(A) and the left nullspace N (AT). They are automatically orthogonal to the 
first v’s and w’s (because the whole nullspaces are orthogonal). We now include all the 
v’s and u’s in V and U, so these matrices become square. We still have AV = UD. 


(m by n)(n by n) C1 
AV equals Ux A Uy © Vr + + Un | = | U1 + + Up ` + Um ar pa (3) 
(m by m)(m by n) r 


The new X is m by n. It is just the r by r matrix in equation (2) with m — r extra zero 
rows and n — r new zero columns. The real change is in the shapes of U and V. Those 
are square matrices and V-! = VT. So AV = UX becomes A = USVT. This is the 
Singular Value Decomposition. I can multiply columns u;o; from UD by rows of VT : 


SVD A=UEV" = uow] +--+ urorv?. (4) 


Equation (2) was a “reduced SVD” with bases for the row space and column space. 
Equation (3) is the full SVD with nullspaces included. They both split up A into the same 
r matrices ujo;v; of rank one. Column times row is the fourth way to multiply matrices. 

We will see that each ø? is an eigenvalue of ATA and also AAT. When we put the 
singular values in descending order, cı > 02 > ...0, > 0, the splitting in equation (4) 
gives the r rank-one pieces of A in order of importance. This is crucial. 


Example 1 When is A = UEV" (singular values) the same as X AXT! (eigenvalues) ? 


Solution A needs orthonormal eigenvectors to allow X = U = V. A also needs 
eigenvalues \ > Oif A = È. So A must be a positive semidefinite (or definite) symmetric 
matrix. Only then will A = X AX~! which is also QAQ™ coincide with A = UXV?. 
Example 2 If A = xy!" (rank 1) with unit vectors x and y, what is the SVD of A? 


Solution The reduced SVD in (2) is exactly xy?, with rank r = 1. It has u; = æ and 
vı = y and cı = 1. For the full SVD, complete uw; = =z to an orthonormal basis 
of w’s, and complete vı = y to an orthonormal basis of v’s. No new a’s, only o = 1. 


Proof of the SVD 


We need to show how those amazing w’s and v’s can be constructed. The v’s will be 
orthonormal eigenvectors of AT A. This must be true because we are aiming for 
ATA = (VEV (UEV =Y "U Uar" = VETV], (5) 


On the right you see the eigenvector matrix V for the symmetric positive (semi) definite 
matrix ATA. And (XTE) must be the eigenvalue matrix of (ATA) : Each o? is A(AT A)! 
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Now Av; = ciu; tells us the unit vectors u to ur. This is the key equation (1). 
The essential point—the whole reason that the SVD succeeds—is that those unit vectors 
u: to Ur are automatically orthogonal to each other (because the v’s are orthogonal): 


Key step Tu; 2 ( k ) (2) E ee J Toj = zero. (6) 


. . U; = Y: 
tJ ? Oj 4 


OiOj OiOj 
The v’s are eigenvectors of ATA (symmetric). They are orthogonal and now the w’s are 
also orthogonal. Actually those u’s will be eigenvectors of AA’. 
Finally we complete the v’s and w’s to n v’s and m w’s with any orthonormal bases 
for the nullspaces N(A) and N(AT). We have found V and © and U in A = UDV?™. 


An Example of the SVD 
Here is an example to show the computation of all three matrices in A = UXV?. 


3 0 


Example 3 Find the matrices U, ©}, V for A = | P 


I The rank is r = 2. 


With rank 2, this A has positive singular values cı and o2. We will see that gı is larger 
than Amax = 5, and og is smaller than Amin = 3. Begin with ATA and AAT : 


25 A a 


TA 
ATA=| 5p 25 12 41 


AAT = | 


Those have the same trace (50) and the same eigenvalues o? = 45 and 03 = 5. The square 
roots are oj = V 45 and o2 = /5. Then 0102 = 15 and this is the determinant of A. 
A key step is to find the eigenvectors of AT A (with eigenvalues 45 and 5): 


25 20 1 1 25 20 —1 —1 
tat sas] E N ate | 
Then vı and v2 are those orthogonal eigenvectors rescaled to length 1. Divide by V2. 


vali] #sa: 


Now compute Avı and Av2 which will be c1u1 = V45 u1 and ozu = V5 U2: 


m= ali] - ali] - om 


m= 34] - elt 


The division by v10 makes uw; and uz orthonormal. Then cı = v45 and o2 = V5 
as expected. The Singular Value Decomposition of A is U times X times VT. 


Av; 


Right singular vectors vı = | Left singular vectors u; = 


Oi 


O2 U2 
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U and V contain orthonormal bases for the column space and the row space (both spaces 
are just R°). The real achievement is that those two bases diagonalize A: AV equals UD. 
The matrix A splits into a combination of two rank-one matrices, columns times rows: 
v45/1 1], v5 $3) 2) 30) 25 
./20 | 3 3 /20| -1 1) +|4 5) 


Al T 
O1U1V; + 02U2V5 = 


An Extreme Matrix 


Here is a larger example, when the w’s and the v’s are just columns of the identity matrix. 
So the computations are easy, but keep your eye on the order of the columns. The matrix 
A is badly lopsided (strictly triangular). All its eigenvalues are zero. AAT is not close to 
AT A. The matrices U and V will be permutations that fix these problems properly. 


0 eigenvalues A = 0,0, 0,0 all zero ! 
0 | only one eigenvector (1,0, 0,0) 

3 singular values c = 3, 2,1 

0 singular vectors are columns of [ 


ATA and AA? are diagonal (with easy eigenvectors, but in different orders): 


Their eigenvectors (u’s for AAT and v’s for AT A) go in decreasing order o? > 03 > o2 
of the eigenvalues. Those eigenvalues are 0? = 9,4, 1. 


0 0 1 0 3 0 0 0 1 
0100 2 0 0 1 0 
v= 1 0 0 0 a 1 inn 0 1 0 0 
0 0 0 1 0 1 0 0 0 


Those first columns u; and vı have 1’s in positions 3 and 4. Then u 01 vT picks out the 
biggest number A34 = 3 in the original matrix A. The three rank-one matrices in the SVD 
come (for this extreme example) exactly from the numbers 3, 2, 1 in A. 


BV = Juv] + uve + lusv3. 


Note Suppose I remove the last row of A (all zeros). Then A is a 3 by 4 matrix and 
AA? is 3 by 3—its fourth row and column will disappear. We still have eigenvalues 
A = 1,4,9in ATA and AAT, producing the same singular values c = 3, 2,1 ind. 
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Removing the zero row of A (now 3 x 4) just removes the last row of X and also 
the last row and column of U. Then (3 x 4) = UXVT = (3 x 3)(3 x 4)(4 x 4). The SVD 
is totally adapted to rectangular matrices. 

A good thing, because the rows and columns of a data matrix A often have completely 
different meanings (like a spreadsheet). If we have the grades for all courses, there would 
be a column for each student and a row for each course: The entry a;; would be the grade. 
Then ciu: UT could have w; = combination course and vı = combination student. 
And a; would be the grade for those combinations: the highest grade. 

The matrix A could count the frequency of key words in a journal: A different article 
for each column of A and a different word for each row. The whole journal is indexed 
by the matrix A and the most important information is in uvi. Then c; is the largest 
frequency for a hyperword (the word combination u1) in the hyperarticle v1. 

Section 7.3 will apply the SVD to finance and genetics and search engines. 


Singular Value Stability versus Eigenvalue Instability 


The 4 by 4 example A provides an example (an extreme case) of the instability of eigen- 
values. Suppose the 4,1 entry barely changes from zero to 1/60, 000. The rank is now 4. 


s ; 5 ; That change by only 1/60, 000 produces a 

A= 0 00 3 much bigger jump in the eigenvalues of A 
7 1 :¿ -1 —i 
n ei a 

ae 10° 10’ 10’ 10 


60, 000 
The four eigenvalues moved from zero onto a circle around zero. The circle has radius 5 
when the new entry is only 1/60, 000. This shows serious instability of eigenvalues when 
AAT is far from ATA. At the other extreme, if ATA = AAT (a “normal matrix”) 
the eigenvectors of A are orthogonal and the eigenvalues of A are totally stable. 

By contrast, the singular values of any matrix are stable. They don’t change more 
than the change in A. In this example, the new singular values are 3, 2, 1, and 1/60, 000. 
The matrices U and V stay the same. The new fourth piece of A is o4u4vj, with 
fifteen zeros and that small entry 74 = 1/60, 000. 


Singular Vectors of A and Eigenvectors of S = ATA 


Equations (5-6) “proved” the SVD all at once. The singular vectors v; are the eigenvectors 
q; of S = ATA. The eigenvalues À; of S are the same as o? for A. The rank r of S equals 
the rank of A. The expansions in eigenvectors and singular vectors are perfectly parallel. 


Symmetric S S=QAQT = qiqi + A202093 +- + Arde de 


Any matrix A A=UXVT =ou] + ocquevd +--+ 0orurv! 
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The q’s are orthonormal, the w’s are orthonormal, the v’s are orthonormal. Beautiful. 

But I want to look again, for two good reasons. One is to fix a weak point in the 
eigenvalue part, where Chapter 6 was not complete. If À is a double eigenvalue of S, we 
can and must find two orthonormal eigenvectors. The other reason is to see how the SVD 
picks off the largest term c1u1vT before c2u2v4. We want to understand the eigenvalues 
A (of S) and the singular values o (of A) one at a time instead of all at once. 


Start with the largest eigenvalue A; of S. It solves this problem: 


T 


Ai = maximum ratio The winning vector is x = q, with Sq; = A1q,. (8) 


ala 


Compare with the largest singular value gı of A. It solves this problem: 


The winning vector is x = vı with Av; = 01U1. (9) 


cı = maximum ratio 


rs 
A2 = maximum ratio ES among all x’s with qi =0. z = q2 Will win. (10) 
Sii . ||Aal| as ee She ae 
Oz = maximum ratio el among all x’s with vj x = 0. x = vz will win. (11) 
£ 


When S = AT A we find \; = g? and àz = o2. Why does this approach succeed? 
Start with the ratio r(x) = xTSæ/æTæ. This is called the Rayleigh quotient. To 


maximize r(x), set its partial derivatives to zero: Or/Ox; = 0 fori = 1,...,n. Those 
derivatives are messy and here is the result: one vector equation for the winning x: 
T 


The derivatives of r(x) = are zero when Sx = 7 (ea: (12) 


So the winning æ is an eigenvector of S. The maximum ratio r(æ) is the largest eigenvalue 
A; of S. All good. Now turn to A—and notice the connection to S = AT A! 


a ae ay atATAr ax Sx 
Maximizing = ——— = 


also maximizes ( 
||z]| 


ate 9 a Tee 
So the winning x = v; in (9) is the same as the top eigenvector q} of S = ATA in (8). 


Now I have to explain why qə and v2 are the winning vectors in (10) and (11). We 
know they are orthogonal to q, and vj, so they are allowed in those competitions. These 
paragraphs can be optional for readers who aim to see the SVD in action (Section 7.3). 
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Start with any orthogonal matrix Qı that has q, in its first column. The other n — 1 
orthonormal columns just have to be orthogonal to q,. Then use Sq, = 1q}: 


T L 
SQ: =S [q G2-+-Gn] = [91 42:n] ` hie =|" Z|: (13) 


Multiply by QT, remember QTQ, = I, and recognize that QT SQ1 is symmetric like S: 


At wi 


The symmetry of Q Si = 0 s 
n—1 


| forces w = 0 and S}; = Sp-1. 


The requirement q/a = 0 has reduced the maximum problem (10) to size n — 1. The 
largest eigenvalue of S,,_; will be the second largest for S. It is A2. The winning vector 
in (10) will be the eigenvector q> with Sq, = à2q9. 

We just keep going—or use the magic word induction—to produce all the eigenvectors 
qı,- -- , qn and their eigenvalues \1,..., An. The Spectral Theorem S = QAQ? is proved 
even with repeated eigenvalues. All symmetric matrices can be diagonalized. 


Similarly the SVD is found one step at a time from (9) and (11) and onwards. Section 
7.4 will show the geometry—we are finding the axes of an ellipse. Here I ask a different 
question: How are the A’s and o’s actually computed? 


Computing the Eigenvalues of S and Singular Values of A 


The singular values c; of A are the square roots of the eigenvalues À; of S = ATA. 
This connects the SVD to a symmetric eigenvalue problem (good). But in the end we don’t 
want to multiply AT times A (squaring is time-consuming: not good). 

The first idea is to produce zeros in A and S without changing any o’s and X's. 
Singular vectors and eigenvectors will change—no problem. The similar matrix Q~!SQ 
has the same )’s as S. If Q is orthogonal, this matrix is QT SQ and still symmetric. 


Section 11.3 will show how to build Q from 2 by 2 rotations so that QT SQ is 
symmetric and tridiagonal (many zeros). But rotations can’t get all the way to a 
diagonal matrix. To show all the eigenvalues of S needs a new idea and more work. 


For the SVD, what is the parallel to QT SQ ? Now we don’t want to change any singular 
values of A. Natural answer: You can multiply A by two different orthogonal matrices Qı 
and Q2. Use them to produce zeros in QT AQ. The o’s don’t change: 


(Qi AQ2)" (QT AQ2) = QZ AT AQ2 = QZ SQz gives the same o(A) and A(S). 


The freedom of two Q’s allows us to reach QT AQ» = bidiagonal matrix (2 diagonals). 
This compares perfectly to QTSQ = 3 diagonals. It is nice to notice the connection 
between them: (bidiagonal)" (bidiagonal) = tridiagonal. 

The final steps to a diagonal A and a diagonal X need more ideas. This problem can’t 
be easy, because underneath we are solving det(S — AI) = 0 for polynomials of degree 
n = 100 or 1000 or more. We certainly don’t use those polynomials ! 
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The favorite way to find \’s and o’s in LAPACK uses simple orthogonal matrices to 
approach QT SQ = A and Ut AV = X. We stop when very close to A and X. 


This 2-step approach (zeros first) is built into the commands eig(S') and svd(A). 
= REVIEW OF THE KEY IDEAS #8 


. The SVD factors A into USV", with r singular values oj >... > op > 0. 
. The numbers o?,.. ., 02 are the nonzero eigenvalues of AAT and ATA. 
. The orthonormal columns of U and V are eigenvectors of AAT and A? A. 


. Those columns hold orthonormal bases for the four fundamental subspaces of A. 


. Those bases diagonalize the matrix: Av; = cu; fori < r. Thisis AV = UX. 


NAN ua Aà Se N m 


. A= 01w] +--+ +0,u,v! and g is the maximum of the ratio || Ax|| / |||. 


= WORKED EXAMPLES = 


7.2 A Identify by name these decompositions of A into a sum of columns times rows: 


1. Orthogonal columns U101,.-.,U,o, times orthonormalrows v},... Epa 
2. Orthonormal columns 4qj,--.,4, times triangular rows rj,...,r;. 
3. Triangular columns liga: times triangular rows Ui... , tH. 


Where do the rank and the pivots and the singular values of A come into this picture? 


Solution These three factorizations are basic to linear algebra, pure or applied: 
1. Singular Value Decomposition A = USV T 
2. Gram-Schmidt Orthogonalization A = QR 
3. Gaussian Elimination A = LU 
You might prefer to separate out singular values g; and heights h; and pivots d;: 
1. A= UV? with unit vectors in U and V. The r singular values c; are in X. 
2. A =QH R with unit vectors in Q and diagonal 1’s in R. The r heights h; are in H. 
3. A = LDU with diagonal 1’s in L and U. The r pivots d; are in D. 


Each h; tells the height of column 7 above the plane of columns 1 to 2 — 1. The volume 
of the full n-dimensional box (r = m = n) comes from A = UVT = LDU = QHR: 


| det A | = | product of o’s | = | product of d’s | = | product of h’s |. 
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7.2B Show that a1 > |A|max. The largest singular value dominates all eigenvalues. 


Solution Start from A = UXV7?. Remember that multiplying by an orthogonal matrix 


does not change length: ||Qæ|| = ||z|| because |Qz||? = rTQTQgz = ata = |lx||’. 
This applies to Q = U and Q = VT. In between is the diagonal matrix X. 
|| Aa|| = UEV "|| = ZV" el] < olv al] = oi |jæll. (14) 


An eigenvector has || Aax|| = |A\||x||. So (14) says that |A|||a|| < o1||ax||. Then |A| < c1. 


Apply also to the unit vector x = (1,0,...,0). Now Az is the first column of A. 
Then by inequality (14), this column has length < o1. Every entry must have |a;;| < 01. 


Equation (14) shows again that the maximum value of || Aax||/||a|| equals c1. 


Section 11.2 will explain how the ratio Omax/Omin governs the roundoff error in solving 
Ax = b. MATLAB warns you if this “condition number” is large. Then «x is unreliable. 


Problem Set 7.2 


1 Find the eigenvalues of these matrices. Then find singular values from A? A: 


0 4 0 4 
a= 9 a= a 
For each A, construct V from the eigenvectors of AT A and U from the eigenvectors 
of AAT. Check that A = UEV". 


2 Find ATA and V and E and u; = Av; /o; and the full SVD: 


| ee 2 T 
asf 2 2] eoz 


3 In Problem 2, show that AAT is diagonal. Its eigenvectors u1, uz are ____. Its 
eigenvalues o?, o2 are . The rows of A are orthogonal but they are not 


So the columns of A are not orthogonal. 


4 Compute AT A and AAT and their eigenvalues and unit eigenvectors for V and U. 
Rectangular matrix A= ; ; | 


Check AV = U® (this decides + signs in U). X has the same shape as A: 2 x 3. 


ee 
3 3 
u in the column space. What is o1? Why is there no o2? 


5 (a) The row space of A = | | is 1-dimensional. Find v; in the row space and 
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10 


11 


12 
13 


14 


15 


16 


17 
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(b) Choose v2 and us in U and V. Then A = UNV? = uic: vi (one term only). 


Substitute the SVD for A and AT to show that ATA has its eigenvalues in ZTE and 
AAT has its eigenvalues in XXT. Since a diagonal XTE has the same nonzeros as 
“DT, we see again that AT A and AAT have the same nonzero eigenvalues. 


If (AT A)v = ov, multiply by A. Move the parentheses to get (AAT) Av = 07 (Av). 
If v is an eigenvector of AT A, then is an eigenvector of AAT. 


Find the eigenvalues and unit eigenvectors v1, V2 of AT A. Then find u; = Avı /01: 


aid + 2 T4 _ |10 20 ¢ {| 5 15 
a=[} J and A aal a and AA a ak 


Verify that u; is a unit eigenvector of AAT. Complete the matrices U, £, V. 


wo [ia] #)[* ole ol” 


Write down orthonormal bases for the four fundamental subspaces of this A. 


(a) Why is the trace of AT A equal to the sum of all az, ? In Example 3 it is 50. 


(b) For every rank-one matrix, why is 0? = sum of all az, ? 


Find the eigenvalues and unit eigenvectors of AT A and AAT. Keep each Av = ou. 


Then construct the singular value decomposition and verify that A equals UXV?. 


Fibonacci matrix A= | ; | 


Use the svd part of the MATLAB demo eigshow to find those v’s graphically. 


If A = UXV? is a square invertible matrix then AT! = 
Check A7~1 A. This shows that the singular values of A~* are 1/0;. 


Note: The largest singular value of AT! is therefore 1/omin(A). The largest eigen- 


value |\(A71)| max is 1/|A(A)|min- Then equation (14) says that omin(A) < |A(A)|min- 


Suppose u1,..., Un and v1,...,Un are orthonormal bases for R”. Construct the 
matrix A=UdV" that transforms each v j into u; to give Avy =u, .. ., AUn =Un. 


Construct the matrix with rank one that has Av = 12u for v = s(1, 1,1, 1) and 
u = (2,2, 1). Its only singular value is 0) = 


Suppose A has orthogonal columns w1, W2,..., Wn of lengths 01, 09,...,0n. 
What are U, X, and V in the SVD? 


Suppose A is a 2 by 2 symmetric matrix with unit eigenvectors u; and u2. If its 
eigenvalues are A; = 3 and Ay = —2, what are the matrices U, ©, VT in its SVD? 


| 
a 
a 
: 
| 
z 
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18 


19 


20 


21 
22 


23 


24 


25 


26 


27 


If A = QR with an orthogonal matrix Q, the SVD of A is almost the same as the 
SVD of R. Which of the three matrices U, X, V is changed because of Q ? 


Suppose A is invertible (with o1 > a2 > 0). Change A by as small a matrix as 
possible to produce a singular matrix Ao. Hint: U and V do not change: 


T 
From A= l Uz, u? l 1 g | l V1 v2 | find the nearest Apo. 
2 


Find the singular values of A from the command svd (A) or by hand. 


1 0 ; 1 
= 1 . ob 
A= | 100 1 | . Why is o2 = 2 for this matrix? 


Why doesn’t the SVD for A + J just use © + I? 


If A= UEV" then Q, AQF = (Q1U)=(Q2V)*. Why will any orthogonal matrices 
Q and Q2 leave Q,U = orthogonal matrix and Q2V = orthogonal matrix? Then © 
sees no change in the singular values: Qi AQ?i has the same o’s as A. 


If Q is an orthogonal matrix, why do all its singular values equal 1 ? 


TS 3r? +2 322 
(a) Find the maximum of x = 25 A sc) What matrix is S ? 
T-T ti +t £5 


2 
(b) Find the maximum of (z1 + 422)" 


5 aE. For what matrix A is this 5 
Li + x5 ||| 


Ag||2 
What are the minimum values of the ratios — an Ax ||" 
ve [æl]? 


to be which eigenvectors of S ? Should x always be an eigenvector of A? 


TS 
2 -? We should take x 


Every matrix A = UXV™ takes circles to ellipses. AV = U® says that the radius 
vectors v and v2 of the circle go to the semi-axes cı u, and oazuz of the ellipse. 
Draw the circle and the ellipse for 0 = 30°: 


0 1 cos —sin0 2 
vali a ay a z=| 


© 
—- © 
E 


Section 7.4 will start with an important SVD picture for 2 by 2 matrices: 


A = (rotate) (stretch) (rotate). With symmetry © = (rotate) (stretch) (rotate back). 


This problem looks for all matrices A with a given column space in R™ and a given 
row space in R”. Suppose c1,..., Cr and b;,...,6, are bases for those two spaces. 
Make them columns of C and B. The goal is to show that A has this form: 

A = CM B"! for an r by r invertible matrix M. Hint: Start from A = UXV"™. 


The first r columns of U and V must be connected to C and B by invertible matrices, 
because they contain bases for the same column space (in U) and row space (in V). 
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7.3 Principal Component Analysis (PCA by the SVD) 


1 Data often comes in a matrix : n samples and m measurements per sample. 


2 Center each row of the matrix A by subtracting the mean from each measurement. 


3 The SVD finds combinations of the data that contain the most information. 


4 Largest singular valueo, <+ greatest variance +> most information in u1. 


This section explains a major application of the SVD to statistics and data analysis. 
Our examples will come from human genetics and face recognition and finance. The prob- 
lem is to understand a large matrix of data (= measurements). For each of n samples we 
are measuring m variables. The data matrix Ao has n columns and m rows. 

Graphically, the columns of Ag are n points in R™. After we subtract the average of 
each row to reach A, the n points are often clustered along a line or close to a plane (or 
other low-dimensional subspace of R'”). What is that line or plane or subspace ? 

Let me start with a picture instead of numbers. For m = 2 variables like age and height, 
the n points lie in the plane R°. Subtract the average age and height to center the data. 
If the n recentered points cluster along a line, how will linear algebra find that line ? 


Ais 2 x n (large nullspace) 
AA? is 2 x 2 (small matrix) 
At Ais n x n (large matrix) 


Two singular values 01 > a2 > O 


Figure 7.2: Data points in A are often close to a line in R? or a subspace in R”. 


Let me go more carefully in constructing the data matrix. Start with the measurements 
in Ag: the sample data. Find the average (the mean) u1, H2,- .., um of each row. Subtract 
each mean u; from row i to center the data. The average along each row is now zero, for the 
centered matrix A. So the point (0, 0) in Figure 7.2 is now the true center of the n points. 
AAT 
nai 
A shows the distance a;; — u; from each measurement to the row average pi. 


The “sample covariance matrix” is defined by S = 


(AA™),, and (AA™)22 show the sum of squared distances (sample variances s?, s3). 


(AAT). shows the sample covariance s12 = (row 1 of A). (row 2 of A). 


The variance is a key number throughout statistics. An average exam score u = 85 
tells you it was a decent exam. A variance of s? = 25 (standard deviation s = 5) 
means that most grades were in the 80’s: closely packed. A sample variance s? = 225 
(s = 15) means that grades were widely scattered. Chapter 12 explains variances. 
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The covariance of a math exam and a history exam is a dot product of those rows of 
A, with average grades subtracted out. Covariance below zero means: One subject strong 
when the other is weak. High covariance means: Both strong or both weak. 

We divide by n — 1 instead of n for reasons known best to statisticians. They tell me 
that one degree of freedom was used by the mean, leaving n — 1. (I think the best plan is 
to agree with them.) In any case n should be a big number to count on reliable statistics. 
Since the rows of A have n entries, the numbers in AAT have size growing like n and the 
division by n — 1 keeps them steady. 


Example 1 Six math and history scores (notice the zero mean in each row) 


A= aa a eas has sample covariance S = AAT Ze 
ae ee ee P Tp | Oe an" 
The two rows of A are highly correlated: s12 = 25. Above average math went with 

above average history. Changing all the signs in row 2 would produce negative covariance 


$12 = —25. Notice that S has positive trace and determinant; AA? is positive definite. 


The eigenvalues of S are near 57 and 3. So the first rank one piece v57 uiv? is much 
larger than the second piece V3 u2 vd. The leading eigenvector u4 shows the direction 
that you see in the scatter graph of Figure 7.2. That eigenvector is close to u = (.6, .8) 
and the direction in the graph nearly gives a 6 — 8 — 10 or 3 — 4 — 5 right triangle. 


The SVD of A (centered data) shows the dominant direction in the scatter plot. 


The second singular vector uz is perpendicular to w;. The second singular value 
o2 ~ V3 measures the spread across the dominant line. If the data points in A fell exactly 
on a line (u; direction), then o2 would be zero. Actually there would only be o1. 


The Essentials of Principal Component Analysis (PCA) 


PCA gives a way to understand a data plot in dimension m = the number of measured 
variables (here age and height). Subtract average age and height (m = 2 for n samples) 
to center the m by n data matrix A. The crucial connection to linear algebra is in the 
singular values and singular vectors of A. Those come from the eigenvalues \ = g? and 
the eigenvectors u of the sample covariance matrix S = AAT/(n — 1). 


2i; 


The total variance in the data is the sum of all eigenvalues and of sample variances s 


Total variance T = o? + - -- + ož, = s? +.---+ s2, = trace (diagonal sum). 


The first eigenvector uı of S points in the most significant direction of the data. 
That direction accounts for (or explains) a fraction oe /T of the total variance. 


The next eigenvector uz (orthogonal to u1) accounts for a smaller fraction ae IT: 


Stop when those fractions are small. You have the R directions that explain most of 
the data. The n data points are very near an R-dimensional subspace with basis 
u: to ur. These w’s are the principal components in m-dimensional space. 


e Ris the “effective rank” of A. The true rank r is probably m or n: full rank matrix. 
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Perpendicular Least Squares 


It may not be widely recognized that the best line in Figure 7.2 (the line in the u; direction) 
also solves a problem of perpendicular least squares (= orthogonal regression) : 


The sum of squared distances from the points to the line is a minimum. 


Proof. Separate each column a; into its components along the u; line and wz line: 


n n n 
Right triangles 5 lla;||? = D Jaya; |? a 5 |a; u2? (1) 
j=1 j=1 j=1 


The sum on the left is fixed by the data points a; (columns of A). The first sum on the right 
is wt AATu;. So when we maximize that sum in PCA by choosing the eigenvector u1, 
we minimize the second sum. That second sum (squared distances from the data points 
to the best line) is a minimum for perpendicular least squares. 

Ordinary least squares in Chapter 4 reached a linear equation AT Az = A‘b by using 
vertical distances to the best line. PCA produces an eigenvalue problem for u, by using 
perpendicular distances. “Total least squares” will allow for errors in A as well as b. 


The Sample Correlation Matrix 


Data analysis works mostly with A (centered data). But the measurements in A might have 
different units like inches and pounds and years and dollars. Changing one set of units 
(inches to meters or years to seconds) would have a big effect on that row of A and S. 
If scaling is a problem, we change from covariance matrix S to correlation matrix C : 


A diagonal matrix D rescales A. Each row of DA has length vn — 1. 
The sample correlation matrix C = DAA™D/(n — 1) has 1’s on its diagonal. 


Chapter 12 on Probability and Statistics will introduce the expected covariance 
matrix V and the expected correlation matrix (with diagonal 1’s). Those use probabili- 
ties instead of actual measurements. The covariance matrix predicts the spread of future 
measurements around their mean, while A and the sample covariances S and the scaled 
correlation matrix C = DSD use real data. All are highly important—a big connection 
between statistics and the linear algebra of positive definite matrices and the SVD. 


Genetic Variation in Europe 


We can follow changes in human populations by looking at genomes. To manage the huge 
amount of data, one good way to see genetic variation is from SNP’s. The uncommon 
alleles (bases A/C/T/G in a pair from father and mother) are counted by the SNP: 


SNP =0 No change from the common base in that population: normal genotype 
SNP = 1 The base pair shows one change from the usual pair 
SNP =2 Both bases are the less common allele 


The uncentered matrix Ag has a column for every person and a row for every base pair. 
The entries are mostly 0, quite a few 1, not so many 2. We don’t test all 3 billion pairs. 
After subtracting row averages from Ag, the eigenvectors of AAT are extremely revealing. 
In Figure 7.4 the first singular vectors of A almost reproduce a map of Europe. 
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This means: The SNP’s from France and Germany and Italy are quite different. Even 
from the French and German and Italian parts of Switzerland those “snips” are different! 
Only Spain and Portugal are surprisingly confounded and harder to separate. More often 
than not, the DNA of an individual reveals his birthplace within 300 kilometers or 200 
miles. A mixture of grandparents usually places the grandchild between their origins. 


Countries are 
identified by 
small circles 


Figure 7.3: Nature (2008) Novembre et al: vol. 456 pp.98-101/doc:10.1038/nature07331. 


What is the significant message? If we test genomes to understand how they correlate 
with diseases, we must not forget their spatial variation. Without correcting for geography, 
what looks medically significant can be very misleading. Confounding is a serious problem 
in medical genetics that PCA and population genetics can help to solve—to remove effects 
due to geography that don’t have medical importance. 


In fact “spatial statistics” is a tricky world. Example: Every matrix with three diagonals 
of 1, C, 1 shows a not surprising influence of next door neighbors (from the 1’s). But its 
singular vectors have sine and cosine oscillations going across the map, independent of C’. 
You might think those are true wave-like variations but they can be meaningless. 


Maybe statistics produces more arguments than mathematics does? Reducing big data 
to a single small “P-value” can be instructive or it can be extremely deceptive. The ex- 
pression P-value appears in many articles. P stands for the probability that an observation 
is consistent with the null hypothesis (= pure chance). If you see 5 heads in a row, the 
probability is P = 1/32 that this came by chance from a fair coin (or P = 2/32 if your 
observation is taken to be 5 heads or 5 tails in a row). Often a P-value below 0.05 makes 
the null hypothesis doubtful—maybe a crook is flipping the coin. As here, P-values are 
not the most reliable guides in statistics—but they are extremely convenient. 
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Eigenfaces 


Recognizing faces would not seem to depend—at first glance—on linear algebra. But an 
early and well publicized application of the SVD was to face recognition. We are not 
compressing an image, we are identifying it. 

The plan is to start with a “training set” Ag of n images of a wide variety of faces. 
Each image becomes a very long vector by stacking all pixel grayscales into a column. 
Then Ap must be centered: subtract the average of every column of Ao to reach A. 

The singular vector vı of this A tells us the combination of known faces that best 
identifies a new face. Then v2 tells us the next best combination. 

Probably we will use the R best vectors v1,...,uR with largest singular values 
0, > ++: > oR of A. Those identify new faces more accurately than any other 
R vectors. Perhaps R = 100 of those eigenfaces Av will capture nearly all the variance in 
the training set. Those R eigenfaces span “face space”. 

This plan of attack was suggested by Matthew Turk and Alex Pentland. It developed 
the suggestion by Sirovich and Kirby to use PCA in compressing images of faces. I learned 
a lot from Jeff Jauregui’s description on the Web. His summary is this: PCA provides a 
mechanism to recognize geometric/photometric similarity through algebraic means. 
He assembled the first principal component (first singular vector) into the first eigenface. 
Of course the average of each column was added back or you wouldn’t see a face! 


Note PCA is compared to NMF in a fascinating letter to Nature (Lee and Seung, 
vol. 401, 21 Oct. 1999). Nonnegative Matrix Factorization does not allow the negative 
entries that always appear in the singular vectors v. So everything adds—which needs 
more vectors but they are often more meaningful. 


at 4 


Figure 7.4: Eigenfaces pick out hairline and mouth and eyes and shape. 
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Applications of Eigenfaces 


The first commercial use of PCA face recognition was for law enforcement and security. 
An early test at Super Bowl 35 in Tampa produced a very negative reaction from the crowd! 
The test was without the knowledge of the fans. Newspapers began calling it the “Snooper 
Bowl”. I don’t think the original eigenface idea is still used commercially (even in secret). 

New applications of the SVD approach have come for other identification problems: 
Eigenvoices, Eigengaits, Eigeneyes, Eigenexpressions. I learned this from Matthew Turk 
(now in Santa Barbara, originally an MIT grad student. He told me he was in my class). 
The original eigenfaces in his thesis had problems accounting for rotation and scaling and 
lighting in the facial images. But the key ideas live on. 

In the end, face space is nonlinear. So eventually we want nonlinear PCA. 


Model Order Reduction 


For a large-scale dynamic problem, the computational cost can become unmanageable. 
“Dynamic” means that the solution u(t) evolves as time goes forward. Fluid flow, 
chemical reactions, wave propagation, biological growth, electronic systems, these prob- 
lems are everywhere. A reduced model tries to identify important states of the system. 
From a reduced problem we compute the needed information at much lower cost. 

Model reduction is a truly important computational approach. Many good ideas have 
been proposed to reduce the original large problem. One simple and often useful idea is 
to take “snapshots” of the flow, put them in a matrix A, find the principal components 
(the left singular vectors of A), and work in their much smaller subspace: 


A snapshot is a column vector that describes the state of the system 
It can be an approximation to a typical true state u(t*) 
From n snapshots, build a matrix A whose columns span a useful range of states 


Now find the first R left singular vectors u: to wR of A. They are a basis for a Proper 
Orthogonal Decomposition (POD basis). In practice we choose R so that 


Variance ~ Energy o%+---+0% is 99% or 99.9% of o? +--+. +02. 


These vectors are an optimal basis for reconstructing the snapshots in A. If those snapshots 
are well chosen, then combinations of u; to upr will be close to the exact solution u(t) for 
desired times t and parameters p. 

So much depends on the snapshots! SZAM Review 2015 includes an excellent survey 
by Beiner, Gugercin, and Willcox. The SVD compresses data as well as images. 


Searching the Web 


We believe that Google creates rankings by a walk that follows web links. When this 
walk goes often to a site, the ranking is high. The frequency of visits gives the leading 
eigenvector (A = 1) of the “Web matrix”—the largest eigenvalue problem ever solved. 


That Markov matrix has more than 3 billion rows and columns, from 3 billion web sites. 
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Many of the important techniques are well-kept secrets of Google. Probably they 
start with an earlier eigenvector as a first approximation, and they run the random 
walk very fast. To get a high ranking, you want a lot of links from important sites. 


Here is an application of the SVD to web search engines. When you google a word, 
you get a list of web sites in order of importance. You could try typing “four subspaces”. 

The HITS algorithm was an early proposal to produce that ranked list. 
It begins with about 200 sites found from an index of key words. After that we look 
only at links between pages. Search engines are link-based more than content-based. 

Start with the 200 sites and all sites that link to them and all sites they link to. That is 
our list, to be put in order. Importance can be measured by links out and links in. 


1. The site may be an authority: Links come in from many sites. Especially from hubs. 


2. The site may be a hub: Links go out to many sites in the list. Especially to authorities. 


We want numbers £1, ..., æn to rank the authorities and yi,..., yw to rank the hubs. 
Start with a simple count: oe? and y? count the links into and out of site 7. 


Here is the point: A good authority has links from important sites (like hubs). Links 
from universities count more heavily than links from friends. A good hub is linked to 
important sites (like authorities). A link to amazon.com unfortunately means more than 
a link to wellesleycambridge.com. The raw counts x° and y® are updated to xt and y1 
by taking account of good links (measuring their quality by x? and y?): 


Authority/Hub æ; / y; = Add up y$ / x} forall links into į / out fromi (2) 


In matrix language those are rt = ATy? and yt = Ax®. The matrix A contains 1’s and 
0’s, with a;; = 1 when ż links to j. In the language of graphs, A is an “adjacency matrix” 
for the Web (an enormous matrix). The new x! and y? give better rankings, but not the 
best. Take another step like (2), to reach æ? and y? from AT Ag? and AATy?®: 


Authority æ? = ATy! = A" Ax? Hub y? = Art =AATy®. (3) 


In two steps we are multiplying by AT A and AAT. Twenty steps will multiply by (A? A)!° 
and (AA‘™)!°, When we take powers, the largest eigenvalue o? begins to dominate. 
The vectors æ and y line up with the leading eigenvectors vı and u; of ATA and AAT. 
We are computing the top terms in the SVD, by the power method that is discussed in 
Section 11.3. It is wonderful that linear algebra helps to understand the Web. 


This HITS algorithm is described in the 1999 Scientific American (June 16). But I 
don’t think the SVD is mentioned there. .. The excellent book by Langville and Meyer, 
Google’s PageRank and Beyond, explains in detail the science of search engines. 
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PCA in Finance: The Dynamics of Interest Rates 


The mathematics of finance constantly applies linear algebra and PCA. We choose one 
application: the yield curve for Treasury securities. The “yield” is the interest rate paid 
on the bonds or notes or bills. That rate depends on time to maturity. For longer bonds 
(3 years to 20 years) the rate increases with length. The Federal Reserve adjusts short term 
yields to slow or stimulate the economy. This is the yield curve, used by risk managers and 
traders and investors. 


Here is data for the first 6 business days of 2001—each column is a yield curve for 
investments on a particular day. The time to maturity is the “tenor”. The six columns at 
the left are the interest rates, changing from day to day. The five columns at the right are 
interest rate differences between days, with the mean difference subtracted from each row. 
This is the centered matrix A with its rows adding to zero. A real world application 
might start with 252 business days instead of 5 or 6 (a year instead of a week). 


Table 1. U.S. Treasury Yields: 6 Days and 5 Centered Daily Differences 


US Treasury Yields in 2001 Matrix A in Basis Points (0.01 %) 
Jan3 Jan4 Jan5 Jan6 Jan7 Jan10|Jan4 Jan5 Jan6 Jan7 Jan10 
—5.4 —19.4 —12.4 19.6 
—4.6 —14.6 -12.6 14.4 
1.0 —14.0 —14.0 9.0 
9.6 —10.4 —16.4 2.6 


13.4 —10.6 -17.6 14 
18.6 —11.4 —15.4 —0.4 
20.8. =11,2 -14.2 0:8 
20.8 —12.2 —11.2 —0.2 
146 -74 -74 0.6 


With five columns we might expect five singular values. But the five column vectors add to 
the zero vector (since every row of A adds to zero after centering). So S = AAT/(5 — 1) 
has four nonzero eigenvalues o? > cf > o3 > of. Here are the singular values c; and 
their squares o? and the fractions of the total variance T = o? + --- + of = trace of S 
that are “explained” by each principal component (each eigenvector u; of S). 


Ti o? o2 /T 

Principal component u, 36.39 1323.9 . 7536 
Principal component u> 19.93 397.2 .2201 
Principal component u3 5.85 34.2 .0195 
Principal component u4 1419 1.4 .0008 
Principal component us 0.00 0.0 .0000 
TSIO 1.0000 


A “scree plot” graphs those fractions o? /T dropping quickly to zero. In a larger problem 
you often see fast dropoff followed by a flatter part at the bottom (near o? = 0). Locating 
the elbow between those two parts (significant and insignificant PC’s) is important. 
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We also aim to understand each principal component. Those singular vectors u; of A 
are eigenvectors of S. The entries in those vectors are the “loadings”. Here are u1 to Us 
for this yield curve example (with Sus = 0). 

U1 U2 U3 Ua U5 

3 MO 0.383 0.529 —0.478 0.060 0.084 

6 MO 0.336 0.436 —0.046 0.210 —0.263 

1 YR 0.358 0.263 0.225 —0.491 0.237 

2 YR 0.352 —0.028 0.460 0.096 0.242 

3 YR 0.371 —0.131 0.430 0.258 —0.555 

5 YR 0.349 —0.293 0.117 —0.188 0.446 

7 YR 0.323 —0.365 —0.228 0.459 0.081 

10 YR 0.297 —0.3878 —0.351 —0.579 —0.470 

20 YR 0.184 —0.280 —0.361 0.227 0.268 
Those five w’s are orthonormal. They give bases for the four-dimensional column space 
of A and the one-dimensional nullspace of AT. What financial meaning do they have? 


uw, measures a weighted average of the daily changes in the 9 yields 
u> gauges the daily change in the yield spread between long and short bonds 
u3 shows daily changes in the curvature (short and long bonds versus medium) 


These graphs show the nine loadings on u1, U2, u3 above from 3 months to 20 years. 


The output from a typical code (written in R) will include two more tables—which are 
going on the book’s website. One will show the right singular vectors v; of A. These are 
eigenvectors of A? A. They are proportional to the vectors ATu. They have 5 components 
and they show the movement of yields and short-long spreads during the week. 

The total variance T = 1756.7 (the trace o? + 03 + 0% + of of S) is also the sum of 
the diagonal entries of S. Those are the sample variances of the rows of A. Here they are: 


s2+---4+83 = 313.34+225.8+199.54+172.3+195.8+196.8+193.7+178.7+80.8 = 1756.7. 
Every s? is below o?. And 1756.7 is also the trace of A? A/(n — 1): column variances. 


Note that this PCA section 7.3 is working with centered rows in A. In some applica- 
tions (like finance), the matrix is usually transposed and the columns are centered. Then 
the sample covariance matrix S uses ATA, and the v’s are the more important principal 
components. Linear algebra with practical interpretations tells us so much. 
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Problem Set 7.3 


1 Suppose Apo holds these 2 measurements of 5 samples: 


54321 
w=? w T] 


Find the average of each row and subtract it to produce the centered matrix A. Com- 
pute the sample covariance matrix S = AAT/(n — 1) and find its eigenvalues \1 
and A. What line through the origin is closest to the 5 samples in columns of A? 


2 Take the steps of Problem 1 for this 2 by 6 matrix Ao : 
tod OA 0 
ja | taai 


3 The sample variances sî, s2 and the sample covariance s12 are the entries of S. 


12 à | ? What is oj ? 


What is S (after subtracting means) when Ag = | 5 2 9 


4 From the eigenvectors of S = AAT, find the line (the u; direction through the 
center point) and then the plane (u1, w2 directions) closest to these four points in 
three-dimensional space : 


t=) 2 0 
A=|0 0 2=2 
IRI E 


5 From this sample covariance matrix S, find the correlation matrix DSD with 1’s 
down its main diagonal. D is a positive diagonal matrix that produces those 1’s. 


4 2 0 
s=)2 4 1 
OPE 


6 Choose the diagonal matrix D that produces DSD and find the correlations c;;: 


2 
si 812 813 L CR Cis 
SS 812 32 823 DSD = C12 if C23 
2 
S13 823 83 c13 C33 1l 
7 Suppose Ag is a 5 by 10 matrix with average grades for 5 courses over 10 years. How 


would you create the centered matrix A and the sample covariance matrix S ? When 
you find the leading eigenvector of S, what does it tell you? 


392 Chapter 7. The Singular Value Decomposition (SVD) 


7.4 The Geometry of the SVD 


1 A typical square matrix A = UVT factors into (rotation) (stretching) (rotation). 


2 The geometry shows how A transforms vectors x on a circle to vectors Ax on an ellipse. 

3 The norm of A is || A|| = 01. This singular value is its maximum growth factor ||Aa|| / |||]. 

4 Polar decomposition factors A into QS: rotation Q = UVT times stretching S = VV". 

5 The pseudoinverse At = VE*UT brings Az in the column space back to æ in the row space, 
The SVD separates a matrix into three steps: (orthogonal) x (diagonal) x (orthogonal). 
Ordinary words can express the geometry behind it: (rotation) x (stretching) x (rotation). 


UV‘ & starts with the rotation to V! a. Then © stretches that vector to SV? a, and U 
rotates to Ax = UDV' 2. Here is the picture. 


a 
vt x 
a `~ 
U2 oo 
J 
Mau o 
V O71U1 


Figure 7.5: U and V are rotations and possible reflections. X stretches circle to ellipse. 


Admittedly, this picture applies to a 2 by 2 matrix. And not every 2 by 2 matrix, 
because U and V didn’t allow for a reflection—all three matrices have determinant > 0. 
This A would have to be invertible because the three steps are shown as invertible: 


a b| | cose —sing o cos@ sing | _ 
al-l IL" a]l Pon 


sin 6 cos @ —sing coso 
The four numbers a,b,c,d in the matrix A led to four numbers 0, c1, 02, ¢ in its SVD. 
This picture will guide us to three neat ideas in the algebra of matrices: 
1 The norm || A|| of a matrix—its maximum growth factor. 
2 The polar decomposition A = QS—orthogonal Q times positive definite S. 


3 The pseudoinverse A*—the best inverse when the matrix A is not invertible. 
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The Norm of a Matrix 


If I choose one crucial number in the picture it is o1. That number is the largest growth 
factor of any vector x. If you follow the vector vı on the left, you see it rotate to (1, 0) 
and stretch to (c1, 0) and finally rotate to 0;u;. The statement Av; = gru; is exactly the 
SVD equation. This largest singular value g; is the “norm” of the matrix A. 


A 
FERE A = max 


The norm || A|| is the largest ratio 
a s20 lel 


= gi (2) 


MATLAB uses norm (æ) for vector lengths and the same word norm (A) for matrix norms. 
The math symbols have double bars: ||a|| and ||A||. Here ||a|| means the standard length 
of a vector with ||æ||? = |x1|? +--+- + |an|?. The matrix norm comes from this vector 
norm when x = v; and Ag = oj, and ||Aag|| /||x|| = o1 = largest ratio = ||Al|. 


Two valuable properties of that number norm (A) come directly from its definition: 


Product 
inequality 


Triangle 


< 
inequality IA + BI] < |All + IBI] 


|ABl| < |All |B| (3) 


The definition (2) says that || Aæ|| < || A|| ||x|| for every vector x. That is what we 
know! Then the triangle inequality for vectors leads to the triangle inequality for matrices: 


For vectors ||(A + B)x|| < ||Az|| + ||Ba|| < |All lll] + |B|] Ifæll- 
Divide this by ||æ||. Take the maximum over all z. Then ||A + B|| < || A|| + ||B]|. 


The product inequality comes quickly from ||ABa|| < ||Al| ||Ba|| < || A|| || BI] [læ]. 
Again divide by ||æ||. Take the maximum over all a. The result is || AB|| < || A|| ||B]|. 


Example 1 A rank-one matrix A = uv! is as basic as we can get. It has one nonzero 


eigenvalue \; and one nonzero singular value g1. Neatly, its eigenvector is wu and its 
singular vectors (left and right) are u and v. 
1) Tu) = qth So à; = vu 


Singular vector ATAv = (vu')(uv!)v = v(utu)(vtv) =0?v Soc = ||ul| Ivl]. 


It makes you feel good that |\;| < cı is exactly the Schwarz inequality |vTu]| < ||u]| |{v]]. 


Eigenvector Au = (uv` ju = u(v 


a 


How do we know that |A| < 01? The eigenvector for Ax = àx will give the ratio 
|| Aa|| / Ilæl| = ||Ara|| / ||a|| which is |[A1|. The maximum ratio cı can’t be less than |A]. 

Is it also true that |A2| < o2? No. That is completely wrong. In fact a 2 by 2 matrix 
will have | det A| = |A,2| = 0102. In this case |Ai| < cı will force |A2| > o2. 
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The closest rank k matrix to A is A, = oyuvf Fe OkURUF 
This is the key fact in matrix approximation: The Eckart- Young-Mirsky Theorem says that 


||A — B|| > ||A — Ag|| = o%-+1 for all matrices B of rank k. 


To me this completes the Fundamental Theorem of Linear Algebra. The v’s and w’s give 
orthonormal bases for the four fundamental subspaces, and the first k v’s and u’s and o’s 
give the best matrix approximation to A. 


Polar Decomposition A = QS 


Every complex number x + iy has the polar form ret. A number r > 0 multi- 
plies a number et? on the unit circle. We have x + iy = rcos@ + irsin@ = r (cos + 
isin ) = ret? . Think of these numbers as 1 by 1 matrices. Then e”? is an orthogonal ma- 
trix Q andr > 0 is a positive semidefinite matrix (call it S). The polar decomposition ex- 
tends the same idea to n by n matrices: orthogonal times positive semidefinite, A = QS. 


Every real square matrix can be factored into A = QS, where Q is orthogonal 
and S is symmetric positive semidefinite. If A is invertible, S is positive definite. 


For the proof we just insert VTV = J into the middle of the SVD: 
Polar decomposition A=USV?T = (UV")(VEV"T) =(Q)(S). (4) 


The first factor UVT is Q. The product of orthogonal matrices is orthogonal. The second 
factor VV" is S. It is positive semidefinite because its eigenvalues are in X. 

If A is invertible then X and S are also invertible. S is the symmetric positive 
definite square root of A‘ A, because S? = V?V" = ATA. So the eigenvalues of 
S are the singular values of A. The eigenvectors of S are the singular vectors v of A. 

There is also a polar decomposition A = KQ in the reverse order. Q is the same but 
now K = UXU?. Then K is the symmetric positive definite square root of AAT. 


a 0 
4 5 
factors Q and S (rotation and stretch) in the polar decomposition A = QS. 


Example 2 The SVD example in Section 7.2 was A = = UXV". Find the 


Solution I will just copy the matrices U and X and V from Section 7.2: 
i — — 1 |4 — 1 — 
Denn a 1 =3 1 Ipa l ee al 
mD aA 1) aeaa a 2 


S= yvy -Ff i ° i E =]; 3|- Then A = Qs. 


In mechanics, the polar decomposition separates the rotation (in Q) from the stretching 
(in S). The eigenvalues of S' give the stretching factors as in Figure 7.5. The eigenvectors 
of S give the stretching directions (the principal axes of the ellipse). The orthogonal matrix 
Q includes both rotations U and VT. 


Here is a fact about rotations. Q = UVT is the nearest orthogonal matrix to A. 
This Q makes the norm ||Q — A|| as small as possible. That corresponds to the fact that 
et? is the nearest number on the unit circle to re”. 
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The SVD tells us an even more important fact about nearest singular matrices : 
The nearest singular matrix Ao to A comes by changing the smallest o,,in to zero. 


SO Gmin is measuring the distance from A to singularity. For the matrix in Example 2 
that distance 1s Cmin = V5. If I change Gmin to zero, this knocks out the last (smallest) 
piece in A = oyuvI + OzUgvT. Then only the rank-one (singular!) matrix o,u,v} 
will be left: the closest to A. The smallest change had norm a2 = V5 (smaller than 3). 

In computational practice we often do knock out a very small o. Working with singular 
matrices is better than coming too close to zero and not noticing. 


The Pseudoinverse At 


By choosing good bases, A multiplies v; in the row space to give gu; in the column space. 
A`! must do the opposite! If Av = ou then A~'u = v/c. The singular values of A`! 
are 1/c, just as the eigenvalues of A~! are 1/À. The bases are reversed. The wu’s are in the 
row space of A~, the v’s are in the column space. 

Until this moment we would have added “if AT! exists? Now we don’t. 
A matrix that multiplies u; to produce v;/o; does exist. It is the pseudoinverse AT: 


2i T 
Pseudoinverse of A 


Ate VETTE = Uz Urs Un ey Uz Ur Um 


n byn n by m m by m 


The pseudoinverse At is an n by m matrix. If A~! exists (we said it again), then A* is 
the same as A~!. In that case m = n = r and we are inverting USV" to get V71U". 
The new symbol A™ is needed when r < morr < n. Then A has no two-sided inverse, 
but it has a pseudoinverse A* with that same rank r: 


Atu; = Los fori<r and Atu;=0 fori>r. 
a 
The vectors u1,...,U, in the column space of A go back to v1, ...,Ur in the row space. 
The other vectors Ur+1,. . ., Um are in the left nullspace, and A* sends them to zero. 
When we know what happens to all those basis vectors, we know A’. 

Notice the pseudoinverse of the diagonal matrix ©. Each ø in È is replaced by o~? in 
Et, The product =*™D is as near to the identity as we can get. It is a projection matrix, 
STD is partly J and otherwise zero. We can invert the o’s, but we can’t do anything about 
the zero rows and columns. This example has cı = 2 and o2 = 3: 


1/2 0 ofzo © 100 ta 
YsS=| 0 1/3 0 o 3 0/=|o 1 of =f5 ol 
0 0 oO} {0 0 0 000 


The pseudoinverse A* is the n by m matrix that makes AA* and At A into projections. 
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A Row space to column space 
At Column space to row space 


Ate=0 


nullspace 
of AT 


Pseudoinverse At 


Ata=| 4 O | row space 


0 1 nullspace 


Figure 7.6: Ax™ in the column space goes back to At Axt = æ* in the row space. 


Trying for AA®* = projection matrix onto the column space of A 
AWA pe Anta meh A* A = projection matrix onto the row space of A 


Example 3 Every rank one matrix is a column times a row. With unit vectors u and v, 
that is A = guv”. Its pseudoinverse is At = vul/o. The product AAT is uu”, the 
projection onto the line through u. The product At A is vu!. 

Example 4 Find the pseudoinverse of A = f l . This matrix is not invertible. The 


rank is 1. The only singular value is 7, = 2. That is inverted to 1/2 in * (also rank 1). 


come Sth te Jatt E 


A® also has rank 1. Its column space is always the row space of A. 


Least Squares with Dependent Columns 


That matrix A with four 1’s appeared in Section 4.3 on least squares. It broke the require- 
ment of independent columns. The matrix appeared when we made two measurements, 
both at time t = 1. The closest straight line went halfway between the measurements 3 
and 1, but there was no way to decide on the slope of the best line. 

In matrix language, ATA was singular. The equation A’ Ax = ATb had infinitely 
many solutions. The pseudoinverse gives us a way to choose a “best solution” xt = Ab. 
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Let me repeat the unsolvable Ax = b and the infinitely solvable AT Az = ATb: 
1 d Ly _ 3 _ Tyan 2 2 Tı _ 4 AT 
aw aUa aa 


Any vector £ = (1 + c, 1 — c) will solve those normal equations ATA® = ATb. 
The purpose of the pseudoinverse is to choose one solution £ = x*. 


x" = Atb = (1, 1) is the shortest solution to A’ AZ = ATb and AZ = p. 


You can see that £t = (1,1) is shorter than any other solution £ = (1+ c,1 — ©). 
The length squared of & is (1 + c)? + (1 — c)? = 2 + 2c’. The shortest choice is c = 0. 
That gives the solution 2+ = (1, 1) in the row space of A. 

The geometry tells us what A* should do: Take the column space of A back to the row 
space. Both spaces have dimension r. Kill off the error vector e in the left nullspace. 

The pseudoinverse At and this best solution x* are essential in statistics, because 
experiments often have a matrix with dependent columns as well as dependent rows. 


= REVIEW OF THE KEY IDEAS = 


1. The ellipse of vectors Ax has axes along the singular vectors w;. 
2. The matrix norm ||A|| = cı comes from the vector length: Maximize ||Aax||/||z||. 
3. Invertible matrix = (orthogonal matrix) (positive definite matrix): A = QS. 


4. Every A = UNV" has a pseudoinverse At = VETUT that sends N (AT) to Z. 


= WORKED EXAMPLES = 


7.4A IfA has rank n (full column rank) then it has a left inverse L = (A™A)~1AT. 
This matrix L gives LA = I. Explain why the pseudoinverse is At = L in this case. 

If A has rank m (full row rank) then it has a right inverse R = AT(AAT)-1. 
This matrix R gives AR = I. Explain why the pseudoinverse is At = R in this case. 


Find L for A, and find R for Ag. Find At for all three matrices A1, Ag, A3: 


=|} | Ay=[2 2] s=} il 
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Solution If A has independent columns then ATA is invertible—this is a key point 
of Section 4.2. Certainly L = (ATA)! AT multiplies A to give LA = I: a left inverse. 

AL = A(A™TA)~1A? is the projection matrix (Section 4.2) on the column space. 
So L meets the requirements on At: L.A and AL are projections on C(A) and C(AT). 


If A has rank m (full row rank) then AA? is invertible. Certainly A multiplies R = 
A™(AA‘*)—! to give AR = I. In the opposite order, RA = AT(AA™)~'A is the projec- 
tion matrix onto the row space (column space of AT). So R equals the pseudoinverse A. 


The example A, has full column rank (for L) and Ag has full row rank (for R): 


_ 1 = 1 2 
Aj = (Aj 41) Are gl? 2 | Af = APAD =<. | a |. 
Notice APA; = [1] and A424} = [1]. But A3 has no left or right inverse. 


Its rank is not full. Its pseudoinverse brings the column space of A; to the row space. 


al 2) -wer_if2. 
3 kel 2 kh | 


Problem Set 7.4 


Problems 1-4 compute and use the SVD of a particular matrix (not invertible). 


1 (a) Compute ATA and its eigenvalues and unit eigenvectors vı and v2. Find 0}. 


Rank one matrix A = L l 
(b) Compute AAT and its eigenvalues and unit eigenvectors u1 and up. 
(c) Verify that Av; = 0, U1. Put numbers into A = USVT (this is the SVD). 


2 (a) From the w’s and v’s in Problem 1 write down orthonormal bases for the four 
fundamental subspaces of this matrix A. 


(b) Describe all matrices that have those same four subspaces. Multiples of A? 


3 From U, V, and È in Problem 1 find the orthogonal matrix Q = UVT and the 
symmetric matrix S = VV". Verify the polar decomposition A = QS. This S is 
only semidefinite because . Test S? = A. 


4 — Compute the pseudoinverse At = VX*U™. The diagonal matrix ©* contains 1/0}. 
Rename the four subspaces (for A) in Figure 7.6 as four subspaces for A+. Compute 
the projections At A and AAT on the row and column spaces of A. 
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Problems 5-9 are about the SVD of an invertible matrix. 


5 


9 


Compute ATA and its eigenvalues and unit eigenvectors vı and v2. What are the 
singular values cı and oz for this matrix A? 


a= [3 3] 


AAT has the same eigenvalues o? and c2 as Al A. Find unit eigenvectors u, and 
us. Put numbers into the SVD: 


fe Je eT 


In Problem 6, multiply columns times rows to show that A = oj u,v; + o2Ugve. 
Prove from A = UV that every matrix of rank r is the sum of r matrices of rank 
one. 


From U, V, and © find the orthogonal matrix Q = U VT and the symmetric matrix 
K = UXU". Verify the polar decomposition in reverse order A = KQ. 


The pseudoinverse of this A is the same as because 


Problems 10-11 compute and use the SVD of a 1 by 3 rectangular matrix. 


10 


11 


12 


13 
14 


Compute A? A and AA? and their eigenvalues and unit eigenvectors when the matrix 
is A= [3 4 OF What are the singular values of A? 


Put numbers into the singular value decomposition of A: 
T 
A=[3 4 0ļ=fu][o 0 Oļ[vi ve v3] 


Put numbers into the pseudoinverse VEtUT of A. Compute AAt and At A: 


Pseudoinverse At = = |v, v2 v3 0 | [u]. 


What is the only 2 by 3 matrix that has no pivots and no singular values? What is 
È for that matrix? A* is the zero matrix, but what is its shape? 


If det A = 0 why is det At = 0? If A has rank r, why does A* have rank r? 


For vectors in the unit circle ||a|| = 1, the vectors y = Az in the ellipse will 
have || At y|| = 1. This ellipse has axes along the singular vectors with lengths 
= 01,..., Or (as in Figure 7.5). Expand ||A~? y||? = 1 for A = [2 1;1 2). 
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Problems 15-18 bring out the main properties of At and rt = Atb. 


15 All matrices in this problem have rank one. The vector b is (b1, b2). 


ES ES dee 2) 


(a) The equation At A% = ATb has many solutions because ATA is 
(b) Verify that 2+ = Atb = (.2b; + .1b2,.2b1 + .1b2) solves AT Axt = ATb. 


(c) Add (1, —1) to that Œ+ to get another solution to A'A% = ATb. Show that 
|||]? = |lat||? + 2, and at is shorter. 


16 The vector c+ = Atb is the shortest possible solution to ATA? = ATb. 
Reason: The difference £ — x* is in the nullspace of A? A. This is also the nullspace 
of A, orthogonal to x+. Explain how it follows that ||#||? = ||at||? + |2 — a7 ||?. 


17 Every bin R” is p+ e. This is the column space part plus the left nullspace part. 
Every x in R” is x* + æn. This is the row space part plus the nullspace part. Then 


AAtp= AAte= At ArT = At Aan = i 


18 Find At and At A and AA? and æ+ for this matrix A = UV! and these b: 


egie -iee : 


19 A general 2 by 2 matrix A is determined by four numbers. If triangular, it is de- : 
termined by three. If diagonal, by two. If a rotation, by one. If a unit eigenvector, / 
also by one. Check that the total count is four for each factorization of A: ) 

À 
, 


Four numbersin LU LDU QR UVT XAX—!. 


20 Following Problem 18, check that LDLT and QAQT are determined by three I 
numbers. This is correct because the matrix is now ; 
21 From A and A* show that ATA is correct and (AtA)? = AtA = projection. i 


T 


T ' T T T 

A= 5 o;ujyv} At= > ei ATA= 5 vive AAt = ` uju! : 
1 meses 1 1 

22 Fach pair of singular vectors v and u has Av = cu and ATu = ov. Show that the 

T 

double vector E is an eigenvector of the symmetric block matrix M = p : | ; l 


The SVD of A is equivalent to the diagonalization of that symmetric matrix M. 


Chapter 8 


Linear Transformations 


8.1 The Idea of a Linear Transformation 


A A linear transformation T takes vectors v to vectors T(v). Linearity requires 


T(cv+dw) =cT(v)+dT(w)| Note T(0) = 0 so T(v) = v + ug is not linear. 


2 The input vectors v and outputs T (v) can be in R” or matrix space or function space. 


3 If Ais m by n, T(x) = Az is linear from the input space R” to the output space R”. 


d T 
4 The derivative T (f) = i is linear. The integral T+ ( f) = i f(t) dt is its pseudoinverse. 
L 0 


Coe product ST of two linear transformations is still linear: | (ST)(v) = S(T(v)). 


When a matrix A multiplies a vector v, it “transforms” v into another vector Av. 
In goes v, out comes T (v) = Av. A transformation T follows the same idea as a function. 
In goes a number z, out comes f(x). For one vector v or one number z, we multiply 
by the matrix or we evaluate the function. The deeper goal is to see all vectors v at once. 
We are transforming the whole space V when we multiply every v by A. 

Start again with a matrix A. It transforms v to Av. It transforms w to Aw. Then we 
know what happens to u = v + w. There is no doubt about Au, it has to equal Av + Aw. 
Matrix multiplication T(v) = Av gives a linear transformation : 


A transformation T assigns an output T (v) to each input vector v in V. 


The transformation is linear if it meets these requirements for all v and w: 


(a) T(v +w) =T(v)+T(w) (b) T(cv) = cT(v) forallc. 
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If the input is v = O, the output must be T (v) = 0. We combine rules (a) and (b) into one: 
Linear transformation T(cu+dw) mustequal cTI(v)+dT(w). 


Again I can test matrix multiplication for linearity: A(cv + dw) = cAv + dAw is true. 


A linear transformation is highly restricted. Suppose T' adds uo to every vector. 
Then T(v) = v + uo and T(w) = w + uo. This isn’t good, or at least it isn’t linear. 
Applying T to v + w produces v + w + uo. That is not the same as T (v) + T (w): 


Shift is not linear v+w+tuo isnot T(v)+T(w) = (v+ uo) + (wt up). 


The exception is when wp = 0. The transformation reduces to T(v) = v. This is the 
identity transformation (nothing moves, as in multiplication by the identity matrix). 
That is certainly linear. In this case the input space V is the same as the output space W. 


The linear-plus-shift transformation T (v) = Av + uo is called “affine”. Straight lines 
stay straight although T'is not linear. Computer graphics works with affine transformations 
in Section 10.6, because we must be able to move images. 


Example 1 Choose a fixed vector a = (1,3, 4), and let T (v) be the dot product a- v: 


The inputis v = (v1, V2, U3). The outputis T(v) =a-v = v + 3v2 + 403. 


Dot products are linear. The inputs v come from three-dimensional space, so V = R3. 
The outputs are just numbers, so the output space is W = R!. We are multiplying by the 
row matrix A=[1 3 4]. Then T(v) = Av. 

You will get good at recognizing which transformations are linear. If the output involves 
squares or products or lengths, v? or v1 V2 or ||v]|, then T is not linear. 


Example 2 The length T (v) = ||v|| is not linear. Requirement (a) for linearity would be 
llv + w|| = |/a|| + |||. Requirement (b) would be ||cv|| = c||v||. Both are false! 


Not (a): The sides of a triangle satisfy an inequality ||v + w|| < ||v|| + || wl. 
Not (b): The length || — v]| is ||v|| and not —||v||. For negative c, linearity fails. 


Example 3 (Rotation) T is the transformation that rotates every vector by 30°. The 
“domain” of T is the zy plane (all input vectors v). The “range” of T is also the zy plane 
(all rotated vectors T (v)). We described T without a matrix: rotate the plane by 30°. 


Is rotation linear? Yes it is. We can rotate two vectors and add the results. The sum of 
rotations T (v) + T(w) is the same as the rotation T (v + w) of the sum. The whole plane 
is turning together, in this linear transformation. 
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Lines to Lines, Triangles to Triangles, Basis Tells All 


Figure 8.1 shows the line from v to w in the input space. It also shows the line from T(v) 
to Tw) in the output space. Linearity tells us: Every point on the input line goes onto 
the output line. And more than that: Equally spaced points go to equally spaced points. 
The middle point u = 4v + ¿w goes to the middle point T(u) = ¿T (v) + $7 (w). 

The second figure moves up a dimension. Now we have three corners v1, U2, V3. 
Those inputs have three outputs T'(v1), T'(v2), T'(v3). The input triangle goes onto the 
output triangle. Equally spaced points stay equally spaced (along the edges, and then 
between the edges). The middle point u = 5 (V1 + v2 + v3) goes to the middle point 


T(u) = 4(T(v1) + T(v2) + T(v3)). 


T(v3) 
T(v1) 
v@—> el (v) vı ——> 
E T (u) U3 
Ne 
w — V2 —_—_——> T (v2) 


Figure 8.1: Lines to lines, equal spacing to equal spacing, u = 0 to T (u) = 0. 


The rule of linearity extends to combinations of three vectors or n vectors: 


Linearity U = C1V1 + C2V2 +: +CnUn must transform to 


(1) 
T(u) = caT (v1) + coT (v2) + +++ + nT (Un) 


The 2-vector rule starts the 3-vector proof: T(cu + dv + ew) = T(cu) + T(dv + ew). 
Then linearity applies to both of those parts, to give cT (u) + dT (v) + eT (w). 


The n-vector rule (1) leads to the most important fact about linear transformations: 


Suppose you know T'(v) for all vectors v1, ..., Un in a basis 


Then you know T (u) for every vector u in the space. 


You see the reason: Every u in the space is a combination of the basis vectors vj. 
Then linearity tells us that T (u) is the same combination of the outputs T'(v;). 
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Example4 The transformation T takes the derivative of the input: T (u) = du/dz. 


How do you find the derivative of u = 6 — 4x + 3x7? You start with the derivatives of 
1, xz, and x”. Those are the basis vectors. Their derivatives are 0, 1, and 2z. Then you 
use linearity for the derivative of any combination: 


du oe a, AA 9 
= 6 (derivative of 1) — 4 (derivative of x) + 3 (derivative of x“) = —4 + 6z. 
All of calculus depends on linearity! Precalculus finds a few key derivatives, for x” and 
sin x and cos x and e”. Then linearity applies to all their combinations. 

I would say that the only rule special to calculus is the chain rule. That produces the 
derivative of a chain of functions f(g(z)). 


Nullspace of T(u) = du/dz. For the nullspace we solve T(u) = 0. The derivative is 
zero when u is a constant function. So the one-dimensional nullspace is a line in function 
space—all multiples of the special solution u = 1. 


Column space of T (u) = du/dz. In our example the input space contains all quadratics 
a+ bx + cx”. The outputs (the column space) are all linear functions b + 2czx. Notice that 
the Counting Theorem is still true: r + (n — r) =n. 


dimension (column space)+dimension (nullspace) = 2+1 = 3 = dimension (input space) 


What is the matrix for d/dx? I can’t leave derivatives without asking for a matrix. 

We have a linear transformation T = d/dx. We know what T does to the basis functions: 
dvi duz dv3 

V1, V2, 03 = 1,2, £? —=0 —=1=v) —-=2r=2v2. (2) 
dx dx da 

The 3-dimensional input space V (= quadratics) transforms to the 2-dimensional output 

space W (= linear functions). If v1,v2,v3 were vectors, I would know the matrix. 


0 1 0 . eat d 
A= | 002 | = matrix form of the derivative 7’ = a (3) 


The linear transformation du/dz is perfectly copied by the matrix multiplication Aw. 


: b du 
b= H Output i. b+ 2cx. 


Input u Multiplication Au = | 
c 


ve 
a+ bx + cx? 


002 


The connection from T to A (we will connect every transformation to a matrix) depended 
on choosing an input basis 1, x, x? and an output basis 1, x. 


Next we look at integrals. They give the pseudoinverse T'+ of the derivative! 
I can’t write T7} and I can’t say “inverse of T” when the derivative of 1 is 0. 
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Example5 Integration T+ is also linear: f} (D + Ex) dx = Dx + $Ex’. 


The input basis is now 1, x. The output basis is 1, x, x7. The matrix A* for T+ is 3 by 2: 


=) 


D 0 
| = | Output = Integral of v 
SE Tt(v) = Dz + $E2? 


© 


0 
Input v Multiplication ATv= | 1 0 
D+ Ex 3 


The Fundamental Theorem of Calculus says that integration is the (pseudo)inverse of 
differentiation. For linear algebra, the matrix A” is the (pseudo)inverse of the matrix A: 


010 


002 01 


0 0 000 10 
AT A= | 1 0 | |- 010 and AAt =| i (4) 
05 001 
The derivative of a constant function is zero. That zero is on the diagonal of At A. 
Calculus wouldn’t be calculus without that 1-dimensional nullspace of T = d/dz. 


Examples of Transformations (mostly linear) 


Example 6 Project every 3-dimensional vector onto the horizontal plane z = 1. The 
vector v = (x,y,z) is transformed to T (v) = (a, y, 1). This transformation is not linear. 
Why not? It doesn’t even transform v = 0 into T (v) = 0. 


Example 7 Suppose A is an invertible matrix. Certainly T(v + w) = Av + Aw = 
T(v) +T(w). Another linear transformation is multiplication by A~+. This produces the 
inverse transformation T~', which brings every vector T (v) back to v: 


T~'(T(v)) =v matches the matrix multiplication A~‘(Av) = v. 
If T(v) = Av and S(u) = Bu, then the product T(.S(u)) matches the product ABu. 


We are reaching an unavoidable question. Are all linear transformations from V = R” 
to W = R” produced by matrices? When a linear T is described as a “rotation” or 
“projection” or “. . .”, is there always a matrix A hiding behind T? Is T(v) always Av? 

The answer is yes! This is an approach to linear algebra that doesn’t start with 
matrices. We still end up with matrices—after we choose an input basis and output basis. 


Note Transformations have a language of their own. For a matrix, the column space 
contains all outputs Av. The nullspace contains all inputs for which Av = 0. Translate 
those words into “range” and “kernel”: 


Range of T = set of all outputs T(v). Range corresponds to column space. 
Kernel of T = set of all inputs for which T(v) = 0. Kernel corresponds to nullspace. 


The range is in the output space W. The kernel is in the input space V. When T is 
multiplication by a matrix, T(v) = Av, range is column space and kernel is nullspace. 
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Linear Transformations of the Plane 


It is more interesting to see a transformation than to define it. When a 2 by 2 matrix A 
multiplies all vectors in R?, we can watch how it acts. Start with a “house” that has eleven 
endpoints. Those eleven vectors v are transformed into eleven vectors Av. Straight lines 
between v’s become straight lines between the transformed vectors Av. (The transfor- 
mation from house to house is linear!) Applying A to a standard house produces a new 
house—possibly stretched or rotated or otherwise unlivable. 

This part of the book is visual, not theoretical. We will show four houses and the 
matrices that produce them. The columns of H are the eleven corners of the first house. (H 
is 2 by 12, so plot2d in Problem 25 will connect the 11th corner to the first.) A multiplies 
the 11 points in the house matrix H to produce the corners AH of the other houses. 

House S80) = OT Oe Ss as 0 0G 
matrix E b 2 1 8 1 2 -7 -7 -2 -2 -7 e i 


35° -sin 35° 
a=|2 sin 


10 E 
IÑ ae 0 1 [SS cos 35° 


Figure 8.2: Linear transformations of a house drawn by plot2d(A x H). 


= REVIEW OF THE KEY IDEAS = 
1. A transformation T takes each v in the input space to T(v) in the output space. 
2. T is linear if T(v + w) = T(v) + T(w) and T(cv) = cT (v): lines to lines. 
3. Combinations to combinations: T (C101 +:+-+¢nUn) = C1 T(v1)+:+: +n Tvn). 


4. T = derivative and TĦ = integral are linear. So is T(v) = Av from R” to R”. 
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= WORKED EXAMPLES 8 


8.1 A The elimination matrix i l gives a shearing transformation from (x,y) to 
T(x, y) = (a,x + y). If the inputs fill a square, draw the transformed square. 


Solution The points (1,0) and (2, 0) on the z axis transform by T to (1, 1) and (2, 2) on 
the 45° line. Points on the y axis are not moved: T (0, y) = (0, y) =eigenvectors with \=1. 


(1, 2) 
Vertical lines slide up À= 1 0 (1, 1) 
This is the shearing are (1,1) 
Squares go to parallelograms (1,0) 


8.1 B A nonlinear transformation T is invertible if every b in the output space comes 
from exactly one æ in the input space: T(x) = b always has exactly one solution. 
Which of these transformations (on real numbers æ) is invertible and what is T~!? 
None are linear, not even Ts. When you solve T(x) = b, you are inverting T: 


1 
T(x)=2? To(x)=2? Tz(x)=x +9 Ta(x)=e” Ts(x)=— for nonzero x’s 
x 


Solution T; is not invertible: z? = 1 has two solutions and z? = —1 has no solution. 
T; is not invertible because e7” = —1 has no solution. (If the output space 
changes to positive b’s then the inverse of e” = bis x = Inb.) 


Notice T? = identity. But T?(x) = x + 18. What are T?(z) and T2 (£)? 


T>, T3, T; are invertible: z? = b and z +9 = band + = b have one solution z. 


z=0 wae ® s= )=0b-9 w= By) 1s 


Problem Set 8.1 


1 A linear transformation must leave the zero vector fixed: T(0) = 0. Prove this from 
T(v +w) =T(v) +T(w) by choosing w = (and finish the proof). Prove it 
also from T (cv) = cT (v) by choosing c = 


2 Requirement (b) gives T (cv) = cT (v) and also T(dw) = dT (w). Then by addition, 
requirement (a) gives T( )=( ). Whatis T(cv + dw + ew)? 


3 Which of these transformations are not linear? The input is v = (v1, v2): 


(a) T(v) = (03, vi) (b) Tw) = (vivi) (cy Tw) = (0,4) 
(d) T(v) = (0,1) (e) T(v) =v -v2 f) Tv)=vv. 
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If S and T are linear transformations, is T'(S(wv)) linear or quadratic? 


(a) (Special case) If S(v) = v and T(v) = v, then T(S(v)) = v or v?? 


(b) (General case) S (v1 +v2) = S(v1)4+S(v2) and T (vı + v2) = T(v1)+T (v2) 
combine into 
T(S(v1 + v2)) =T( )= + 


Suppose T(v) = v except that T(0, v2) = (0,0). Show that this transformation 
satisfies T (cv) = cT (v) but does not satisfy T(v + w) = T(v) + T(w). 


Which of these transformations satisfy T (v + w) = T(v) +T (w) and which satisfy 
T(cv) =cT(v)? 
(a) Tw) = v/|\v|| (b) T(v) = v1 +ve+v3 (c) T(v) = (v, 2v2, 3v3) 
(d) T(v) = largest component of v. 


For these transformations of V = R? to W = R2, find T(T(v)). Show that when 
T (v) is linear, then also T (T (v)) is linear. 

(a) T(v) = —v (b) T(v)=v+ (1,1) 

(c) T(v) = 90° rotation = (—vo, v1) 

(d) T(v) = projection = 5 (M1 + v2, vı + U2). 


Find the range and kernel (like the column space and nullspace) of T: 
(a) T(v1, v2) = (vı — v2, 0) (b) T(v1, v2, v3) = (v1, %) 
(c) T'(v1, v2) = (0,0) (d) T(u, v2) = (v1, %). 


The “cyclic” transformation T is defined by T (v1, v2, v3) = (v2, V3, V1). What is 
T(T(v))? What is T’ (v)? What is T1°°( v)? Apply T a hundred times to v. 


A linear transformation from V to W has an inverse from W to V when the range is 
all of W and the kernel contains only v = 0. Then T (v) = w has one solution v for 
each w in W. Why are these T’s not invertible? 


(a) T(t, v2) = (%, v2) W = R? 
(b) T(v1, v2) = (v1, V2, V1 + v2) W = R? 
(c) T(v1, v2) = V1 W= R! 


If T(v) = Av and A is m by n, then T is “multiplication by A.” 


(a) What are the input and output spaces V and W? 
(b) Why is range of T = column space of A? 
(c) Why is kernel of T = nullspace of A? 


8.1. The Idea of a Linear Transformation 409 


12 Suppose a linear T transforms (1, 1) to (2,2) and (2,0) to (0,0). Find T (v): 
(a) v = (2,2) (b) v= (3,1) (c) v= (—1,1) (d) v= (a,b). 
Problems 13-19 may be harder. The input space V contains all 2 by 2 matrices M. 


13 M is any 2 by 2 matrix and A = E al The transformation T' is defined by 
T(M) = AM. What rules of matrix multiplication show that T is linear? 


14 Suppose A = ie 2] . Show that the range of T is the whole matrix space V and the 
kernel is the zero matrix: 


(1) If AM = 0 prove that M must be the zero matrix. 
(2) Find a solution to AM = B for any 2 by 2 matrix B. 


15 Suppose A = E A Show that the identity matrix J is not in the range of T. Find a 
nonzero matrix M such that T(M) = AM is zero. 


16 Suppose T transposes every 2 by 2 matrix M. Try to find a matrix A which gives 
AM = M7’. Show that no matrix A will do it. To professors: Is this a linear 
transformation that doesn’t come from a matrix? The matrix should be 4 by 4! 


17 The transformation T that transposes every 2 by 2 matrix is definitely linear. Which 
of these extra properties are true? 


(a) T? = identity transformation. 

(b) The kernel of T is the zero matrix. 

(c) Every 2 by 2 matrix is in the range of T. 
(d) T(M) = —M is impossible. 


18 Suppose T(M) = [38][m][89]. Find a matrix with T(M) 4 0. Describe all 
matrices with T(M) = 0 (the kernel) and all output matrices T (M) (the range). 


19 If Aand Bare invertible and T(M) = AMB, findT~'(M) inthe form( )M(_). 
Questions 20-26 are about house transformations. The output is T(H) = AH. 
20 How can you tell from the picture of T (house) that A is 


(a) a diagonal matrix? 
(b) arank-one matrix? 


(c) a lower triangular matrix? 


21 Draw apicture of T (house) for these matrices: 


2 0 T T ofi 1 
p=l | and A=|% 3 and v= | i 
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22 


23 
24 


25 


26 


27 


28 


29 


30 
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What are the conditions on A = es b] to ensure that T (house) will 
(a) sit straight up? 
(b) expand the house by 3 in all directions? 


(c) rotate the house with no change in its shape? 
Describe T (house) when T(v) = —v + (1,0). This T is “affine”. 
Change the house matrix H to add a chimney. 


The standard house is drawn by plot2d(H). Circles from o and lines from —: 
x = H(1,:)';y = H(2,:)'; 
axis(|-1010-1010]), axis square’) 
Plotis onau ="); 

Test plot2d(A’* H) and plot2d(A’* A x H) with the matrices in Figure 8.1. 


Without a computer sketch the houses A * H for these matrices A: 
1 0 5.5 55 1 1 
: j ane k 4 ne E 4 ae f : 


This code creates a vector theta of 50 angles. It draws the unit circle and then 
it draws T (circle) = ellipse. T(v) = Av takes circles to ellipses. 


A=[21;12] % You can change A 

theta = [0:2 * pi/50:2 x pi]; 

circle = [cos(theta); sin(theta)]; 

ellipse = A x circle; 

axis([-4 4 —4 4]); axis(‘square’) 

plot(circle(1,:), circle(2,:), ellipse(1,:), ellipse(2,:)) 

Add two eyes and a smile to the circle in Problem 27. (If one eye is dark and the 


other is light, you can tell when the face is reflected across the y axis.) Multiply by 
matrices A to get new faces. 


What conditions on det A = ad — bc ensure that the output house AH will 
(a) be squashed onto a line? 
(b) keep its endpoints in clockwise order (not reflected)? 
(c) have the same area as the original house? 


Why does every linear transformation T from R? to R? take squares to parallelo- 
grams? Rectangles also go to parallelograms (squashed if 7’ is not invertible). 
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8.2 The Matrix of a Linear Transformation 


1 We know all T (v) if we know T(),...,7'(v,) for an input basis v1,..., Un : use linearity. 


2 Column j in the “matrix for T” comes from applying T to the input basis vector vj. 


3 Write T (vj) = a1;W1 +++: +amjWm in the output basis of w’ s. Those a;; go into column j. 


4 The matrix for T(x) = Az is A, if the input and output bases = columns of Ix and Imm. 
5 When the bases change to v’sand w’s, the matrix for the same T changes from A to W~! AV. 


6 Best bases: V = W = eigenvectors and V, W = singular vectors give diagonal A and &. 


The next pages assign a matrix A to every linear transformation T. For ordinary column 
vectors, the input v is in V = R” and the output T (v) isin W = R”. The matrix A for 
this transformation will be m by n. Our choice of bases in V and W will decide A. 

The standard basis vectors for R” and R”™ are the columns of J. That choice leads to 
a standard matrix. Then 7'( v) = Av in the normal way. But these spaces also have other 
bases, so the same transformation T is represented by other matrices. A main theme of 
linear algebra is to choose the bases that give the best matrix (a diagonal matrix) for T. 


All vector spaces V and W have bases. Each choice of those bases leads to a matrix 
for T. When the input basis is different from the output basis, the matrix for T (v) = v will 
not be the identity 7. It will be the “change of basis matrix”. Here is the key idea: 


Suppose we know T (v) for the input basis vectors vı to vn. 
Columns 1 to n of the matrix will contain those outputs T (v1) to T(vn). 


A times c = matrix times vector = combination of those n columns. 
Ac is the correct combination cıT (v1) +---+cnT (vn) = T(v). 


Reason Every v is a unique combination c;v; + +: + CnUn of the basis vectors vj. 
Since T is a linear transformation (here is the moment for linearity), T (v) must be 
the same combination cıT (v) +---+ Cn v,) of the outputs T( v,) in the columns. 


Our first example gives the matrix A for the standard basis vectors in R? and R2. 


Example1 Suppose T transforms vı = (1,0) to T (vı) = (2,3, 4). Suppose the second 
basis vector v2 = (0,1) goes to T( wm) = (5,5,5). If T is linear from R? to R? then its 
“standard matrix” is 3 by 2. Those outputs T (vı) and T (v2) go into the columns of A: 


2 5 2 5 1 7 
Az 12 5 cı = l and c2 = 1 give T (vı +v2)=]3 5 | [= 8 
4 5 4 5 9 
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Change of Basis 


Example 2 Suppose the input space V = R°? is also the output space W = R. 
Suppose that T(v) = v is the identity transformation. You might expect its matrix 
to be J, but that only happens when the input basis is the same as the output basis. 
I will choose different bases to see how the matrix is constructed. 

For this special case T (v) = v, I will call the matrix B instead of A. We are just 
changing basis from the v’s to the w’s. Each v is a combination of w, and w2. 


Input tla 3 6 Output EE 30 Change v; = 1wi + 1w2 
basis 3 8 basis 1 21] of basis və = 2w, + 3 w2 


Please notice! I wrote the input basis v1, U2 in terms of the output basis w1, wo. 
That is because of our key rule. We apply the identity transformation T' to each input 
basis vector: T(vı) = vı and T(v2) = v2. Then we write those outputs vı and v2 
in the output basis w; and w2. Those bold numbers 1,1 and 2, 3 tell us column 1 and 
column 2 of the matrix B (the change of basis matrix): WB=VsoB=W7—'V. 


Matrix B for . | 3 0 1 2 3 6 
change of basis fw: wal G a [o v 8 i 4 3 B E aE (1) 


When the input basis is in the columns of a matrix V, and the output basis 


is in the columns of W, the change of basis matrix for T = I is B = Ww-tv. 


Thekey Iseeaclear way to understand that rule B = W~1V. Suppose the same vec- 
tor u is written in the input basis of v’s and the output basis of w’s. I will do that three ways: 


C1 dı 
U = C1V1 +°°++CnUn 


j eae : = bane a g = d. 
ne re eee eee is | v1 Un w1 w : and Vc = W 


Cn dr, 


The coefficients d in the new basis of w’s are d = W~ tV c. Then B is WTV. (2) 
This formula B = W- 1V produces one of the world’s greatest mysteries: When the 


standard basis V = I is changed to a different basis W, the change of basis matrix is 
not W but B = W- 1. Larger basis vectors have smaller coefficients! 


=j 
| ; | in the standard basis has coefficients jon wa | in the w 1, We basis. 
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Construction of the Matrix 


Now we construct a matrix for any linear transformation. Suppose T transforms the space 
V (n-dimensional) to the space W (m-dimensional). We choose a basis ¥1,..., Un for V 
and we choose a basis W1,..., Wm for W. The matrix A will be m by n. To find the first 
column of A, apply T to the first basis vector v. The output T (v1) is in W. 


T(vı) itsacombination aiw +::: + amıWwm ofthe output basis for W. 


These numbers az1,. . .,amı go into the first column of A. Transforming v, to T (vı) 

matches multiplying (1,0,...,0) by A. It yields that first column of the matrix. 
When T is the derivative and the first basis vector is 1, its derivative is T (vı) = 0. 
So for the derivative matrix below, the first column of A is all zero. 


Example 3 The input basis of v’s is 1, z, 27, x’. The output basis of w’s is 1, £, x°. 


v 
Then T takes the derivative: T(v) = ae and A = “derivative matrix”. 
x 
c 
If v = C1 + coe + cau? + car’ OL 0 Q C2 
rv are Pe Ac=|0 0 2 0 | = | 2c 
ules aa Co + 2C3£ + JCAL 0003 3c4 
4 
Key rule: The jth column of A is found by applying T to the jth basis vector v; 
T(v;) = combination of output basis vectors = a1;w1 + +++ + amjWm. (3) 


These numbers a;; go into A. The matrix is constructed to get the basis vectors right. 
Then linearity gets all other vectors right. Every v is a combination c1v1 +--+: + CnUn, 
and Tv) is a combination of the w’s. When A multiplies the vector e = (cj,. . ., Cn) 
in the v combination, Ac produces the coefficients in the T (v) combination. This is 
because matrix multiplication (combining columns) is linear like T. 

The matrix A tells us what T does. Every linear transformation from V to W can be 
converted to a matrix. This matrix depends on the bases. 


Example 4 For the integral T* (v), the first basis function is again 1. Its integral is the 
second basis function z. So the first column of the “integral matrix” AT is (0,1, 0,0). 


0 0 0 0 
The integral of dı + dox + d3x? Ve 1 0 0 - dı 
1 1 = 22 | 
is dy yx + —dex? + —d3z° 0 7 0 d3 ade 
2 3 0 0 3 ld; 
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If you integrate a function and then differentiate, you get back to the start. So AAT = I. 
But if you differentiate before integrating, the constant term is lost. So A‘ A is not I. 
The integral of the derivative of 1 is zero: 


T*T(1) = integral of zero function = 0. 


This matches At A, whose first column is all zero. The derivative T has a kernel (the 
constant functions). Its matrix A has a nullspace. Main idea again: Av copies T (v). 

The examples of the derivative and integral made three points. First, linear trans- 
formations 7’ are everywhere—in calculus and differential equations and linear algebra. 
Second, spaces other than R” are important—we had functions in V and W. Third, 
if we differentiate and then integrate, we can multiply their matrices At A. 


Matrix Products AB Match Transformations T'S 


We have come to something important—the real reason for the rule to multiply matrices. 
At last we discover why! Two linear transformations T and S are represented by two 
matrices A and B. Now compare T'S with the multiplication AB: 


When we apply the transformation T to the output from S, we get T'S by this rule: 
(T'S) (u) is defined to be T(S(u)). The output S(u) becomes the input to T. 


When we apply the matrix A to the output from B, we multiply AB by this rule: 
(AB)(a) is defined to be A( Ba). The output Ba becomes the input to A. 
Matrix multiplication gives the correct matrix AB to represent TS. 


The transformation S is from a space U to V. Its matrix B uses a basis w1,..., Up 
for U and a basis v1,..., Un for V. That matrix is n by p. The transformation T is from 
V to W as before. Its matrix A must use the same basis v1,...,Un for V—this is the 
output space for S and the input space for T. Then the matrix AB matches TS. 


Multiplication The linear transformation T'S starts with any vector u in U, goes 
to S(uw) in V and then to T(S(u)) in W. The matrix AB starts with any æ in R?, 
goes to Ba in R” and then to ABg in R”. The matrix AB correctly represents T'S: 


TS: U>V>W AB: (mbyn)(n by p) = (m by p). 


The input is u = 21 u, +--:+2pUy. The output T(S(u)) matches the output ABg. 
Product of transformations T S matches product of matrices AB. 

The most important cases are when the spaces U, V, W are the same and their bases 
are the same. With m = n = p we have square matrices that we can multiply. 


Example 5 © rotates the plane by 0 and T also rotates by 6. Then T'S rotates by 20. 
This transformation T? corresponds to the rotation matrix A? through 26: 


cos20 —sin20 


> k 2 : 2S 
T=9 A=B T? = rotation by 20 A = ie 20 cos 20 


| (4) 
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By matching (transformation)? with (matrix)?, we pick up the formulas for cos 20 
and sin 20. Multiply A times A: 


ee | lie | p oo —2 sin cos 0 (5) 


sin cos | | sin@ cos 0 2sin@ cos 0 cos? 6 — sin? 6 | ` 


Comparing (4) with (5) produces cos20 = cos? 6 — sin? 0 and sin20 = 2sin@cos9@. 
Trigonometry (the double angle rule) comes from linear algebra. 


Example6 <S rotates by the angle 0 and T rotates by —0. Then T'S = I leads to AB = I. 


In this case T(S(u)) is u. We rotate forward and back. For the matrices to match, 
ABz must be x. The two matrices are inverses. Check this by putting cos(—@) = cos@ 


and sin(—@) = — sin @ into the backward rotation matrix A: 
cos@ sin@|}cos@ —sin@ cos? 6 + sin? 6 0 
AB = 5 A = 2 Se?) = T 
— sin cos | |} sind cos 0 cos’ 0 + sinf 0 


Choosing the Best Bases 


Now comes the final step in this section of the book. Choose bases that diagonalize the 
matrix. With the standard basis (the columns of J) our transformation T' produces some 
matrix A—probably not diagonal. That same T is represented by different matrices when 
we choose different bases. The two great choices are eigenvectors and singular vectors: 


Eigenvectors If T transforms R” to R”, its matrix A is square. But using the 
standard basis, that matrix A is probably not diagonal. If there are n indepen- 
dent eigenvectors, choose those as the input and output basis. In this good basis, 
the matrix for T is the diagonal eigenvalue matrix A. 


Example 7 The projection matrix T projects every v = (x,y) in R? onto the 
line y = —a. Using the standard basis, vı = (1,0) projects to T(v1) = (3, —3). 
For v2 = (0, 1) the projection is T(v2) = (—4, 5). Those are the columns of A: 


Projection matrix 
Standard bases A= 
Not diagonal 


i 
| has AT =A and A?=A. 
J 


Now comes the main point of eigenvectors. Make them the basis vectors ! Diagonalize ! 
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When the basis vectors are eigenvectors, the matrix becomes diagonal. 
vı = w = (1, —1) projects to itself : T(v,) = vı and àı = 1 
V2 = w2 = (1, 1) projects to zero : T(v2) = O and A» = 0 


Eigenvector bases 
Diagonal matrix 0 0 0 A 


Eigenvectors are the perfect basis vectors. They produce the eigenvalue matrix A. 


The new matrix is Į i = i J ZA (6) 


What about other choices of input basis = output basis? Put those basis vectors into 
the columns of B. We saw above that the change of basis matrices (between standard basis 
and new basis) are Bin = B and Bout = Bo. The new matrix for T is similar to A: 


Anew = B+ AB in the new basis of b’s is similar to A in the standard basis: 


_ p-1 
Ap, to b’s — B standard to b’s A standard B b’s to standard (7) 


I used the multiplication rule for the transformation ITI. The matrices for I,T, I were 
B-!,A,B. The matrix B contains the input vectors b in the standard basis. 


Finally we allow different spaces V and W, and different bases v’s and w’s. When 
we know T and we choose bases, we get a matrix A. Probably A is not symmetric or even 
square. But we can always choose v’s and w’s that produce a diagonal matrix. This will 
be the singular value matrix X = diag (o1,...,@,) in the decomposition A = UXV?. 


Singular vectors The SVD says that UTAV = ©. The right singular vectors 
V1,-.-,Un Will be the input basis. The left singular vectors u1, ..., Um will be 
the output basis. By the rule for matrix multiplication, the matrix for the same 


transformation in these new bases is B34 ABin = U~1 AV =X. 


I can’t say that X is “similar” to A. We are working now with two bases, input and output. 
But those are orthonormal bases and they preserve the lengths of vectors. Following a good 
suggestion by David Vogan, I propose that we say: © is “isometric” to A. 


Definition C= Q AQ is isometric to A if Qı and Q2 are orthogonal. 


Example 8 To construct the matrix A for the transformation T = Æ, we chose the 


input basis 1, x, x”, x’ and the output basis 1, z, x7. The matrix A was simple but unfortu- 
nately it wasn’t diagonal. But we can take each basis in the opposite order. 

Now the input basis is x°, x”, x, 1 and the output basis is x”, z, 1. The change of basis 
matrices Bin and Bout are permutations. The matrix for T(w) = du/dz with the new 


bases is the diagonal singular value matrix BZ} ABin = = with o’s = 3,2,1: 
i fo oo ‘| [3 000 

B71 ABin = 1 O° O02) 0 =| 0, 2 0 ü|.: 8) 
1 0003 1 0 0 1 0 


Well, this was a tough section. We found that z°, x”, x, 1 have derivatives 3x7, 2x, 1, 0. 
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= REVIEW OF THE KEY IDEAS =" 


1. If we know T(v,),...,7 (vn) for a basis, linearity will determine all other T (v). 
Linear transformation T’ Matrix A (m by n) 

2: Input basis v1,. . . Un — represents T’ 
Output basis wW1,. . ., Wm in these bases 


3. The change of basis matrix B = W~!V = B3} Bin represents the identity T (v) =v. 


out 


4. If A and B represent T and S, and the output basis for S' is the input basis for T', 
then the matrix AB represents the transformation T(S(w)). 


5. The best input-output bases are eigenvectors and/or singular vectors of A. Then 


B-'AB = A = eigenvalues BZ} ABin = © = singular values. 


out 


= WORKED EXAMPLES = 


8.2 A The space of 2 by 2 matrices has these four “vectors” as a basis: 


bees 1 0 mope Oe ee 0 0 nE 0 0 
"Tjo 0 OS AES = ie L 
T is the linear transformation that transposes every 2 by 2 matrix. What is the matrix A that 


represents T in this basis (output basis = input basis)? What is the inverse matrix A71? 
What is the transformation T7! that inverts the transpose operation ? 


Solution Transposing those four “basis matrices” just reverses vs and v3: 


T (v1) = vi LO 0-0 
T (v2) = v3 , ee) fees er 
geen gives the four columns of <A = 010 0 
T (v4) = v4 0 0 0 1 


The inverse matrix AT! is the same as A. The inverse transformation T7! is the same as T. 
If we transpose and transpose again, the final matrix equals the original matrix. 

Notice that the space of 2 by 2 matrices is 4-dimensional. So the matrix A (for the 
transpose T`) is 4 by 4. The nullspace of A is Z and the kernel of T is the zero matrix—the 
only matrix that transposes to zero. The eigenvalues of A are 1,1,1, —1. 

Which line of matrices has T(A) = AT = —A with that eigenvalue \ = —1 ? 
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Problem Set 8.2 


Questions 1—4 extend the first derivative example to higher derivatives. 


1 The transformation S takes the second derivative. Keep 1,x,x7,x° as the input 


basis V1, V2, V3, V4 and also as output basis w1, w2, w3, w4. Write S(v1), S(v2), 
S(v3), S(v4) in terms of the w’s. Find the 4 by 4 matrix A» for S. 


2 What functions have S(v) = 0? They are in the kernel of the second derivative S. 
What vectors are in the nullspace of its matrix Az in Problem 1? 


3 The second derivative Ag is not the square of a rectangular first derivative matrix A1: 
oO 1 0 0 
A,=|0 0 2 0| doesnotallow A? = A3. 
00 0 3 
Add a zero row 4 to A; so that output space = input space. Compare A? with Apo. 
Conclusion: We want output basis = basis. Then m = n. 
4 (a) The product T.S of first and second derivatives produces the third derivative. 


Add zeros to make 4 by 4 matrices, then compute A; Az = A3. 
(b) The matrix A? corresponds to S? = fourth derivative. Why is this zero? 
Questions 5-9 are about a particular transformation T and its matrix A. 
5 With bases v1, v2, V3 and w1, W2, W3, suppose T(v1) = wz and T (v2) = T(v3) = 


wı + w3. T is a linear transformation. Find the matrix A and multiply by the 
vector (1, 1,1). What is the output from T when the input is vı + v2 + v3? 


6 Since T (v2) = T (v3), the solutions to T(v) = 0 are v = . What vectors are 
in the nullspace of A? Find all solutions to T(v) = w2. 


7 Find a vector that is not in the column space of A. Find a combination of w’s that is 
not in the range of the transformation T. 


8 You don’t have enough information to determine T*. Why is its matrix not necessar- 
ily A?? What more information do you need? 


9 Find the rank of A. The rank is not the dimension of the whole output space W. 
It is the dimension of the < of T. 


Questions 10-13 are about invertible linear transformations. 


10 Suppose T (vı) = wı + w2 + wz and T (v2) = w2 + w3 and T (v3) = w3. Find 
the matrix A for T using these basis vectors. What input vector v gives T (v) = w1? 


11 Invert the matrix A in Problem 10. Also invert the transformation T—what are 
T~+(w ,) and Tt (w2) and T~1(w3)? 


12 Which of these are true and why is the other one ridiculous? 
QT T=I (b) T-I(T(v1)) =v (c) PAT Cay) = wy. 


8.2. The Matrix of a Linear Transformation 419 


13 Suppose the spaces V and W have the same basis v1, v2. 
(a) Describe a transformation T (not J) that is its own inverse. 
(b) Describe a transformation T (not J) that equals hee 
(c) Why can’t the same 7’ be used for both (a) and (b)? 


Questions 14-19 are about changing the basis. 


14 (a) What matrix B transforms (1,0) into (2,5) and transforms (0, 1) to (1,3)? 
(b) What matrix C transforms (2, 5) to (1,0) and (1,3) to (0,1)? 
(c) Why does no matrix transform (2,6) to (1,0) and (1,3) to (0,1)? 


15 (a) What matrix M transforms (1,0) and (0, 1) to (r,t) and (s, u)? 
(b) What matrix N transforms (a,c) and (b, d) to (1,0) and (0,1)? 
(c) What condition on a, b, c, d will make part (b) impossible? 
16 (a) How do M and N in Problem 15 yield the matrix that transforms (a, c) to (r, t) 
and (b, d) to (s, u)? 
(b) What matrix transforms (2, 5) to (1, 1) and (1,3) to (0, 2)? 
17 If you keep the same basis vectors but put them in a different order, the change of 


basis matrix B is a matrix. If you keep the basis vectors in order but change 
their lengths, B is a matrix. 


18 The matrix that rotates the axis vectors (1,0) and (0,1) through an angle 6 is Q. 
What are the coordinates (a, b) of the original (1,0) using the new (rotated) axes? 
This inverse can be tricky. Draw a figure or solve for a and b: 


cos? —sin#@ 1 n cos 0 £4 —sin@ 
sinô  cosð 0| ~~ | sind cos 0 | ` 
19 The matrix that transforms (1,0) and (0,1) to (1,4) and (1,5) is B = 


The combination a(1,4) + 6(1,5) that equals (1,0) has (a,b) = ( , ). 
How are those new coordinates of (1,0) related to B or B71? 


Q= 


Questions 20-23 are about the space of quadratic polynomials y = A + Ba + Cz?. 


20 The parabola wı = a + x) equals one at x = 1, and zero at z = 0 and z = —1. 
Find the parabolas w2, w3, and then find y(x) by linearity. 


(a) wz equals one at x = 0 and zero at x = 1 and z = —1. 
(b) w3 equals one at x = —1 and zero at x = 0 and z = 1. 
(c) y(x) equals 4 at x = 1 and 5 at x = 0 and 6 at x = —1. Use w1, wa, ws. 
21 One basis for second-degree polynomials is vı = 1 and vg = z and v3 = x. 


Another basis is w1, W2, w3 from Problem 20. Find two change of basis matrices, 
from the w’s to the v’s and from the v’s to the w’s. 
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What are the three equations for A, B,C if the parabola y = A + Bx + Cz? equals 
4 atx = aand5 at x = b and 6 at x = c? Find the determinant of the 3 by 3 matrix. 
That matrix transforms values like 4, 5, 6 to parabolas y—or is it the other way? 


Under what condition on the numbers ™ ,7™2,..., Mmg do these three parabolas give 
a basis for the space of all parabolas a+ bx + cx? ? 

vı = Mı + Mox + m3r? V2 = M4 +52 + mer? V3 = M7 + Mg£ + Mgr. 
The Gram-Schmidt process changes a basis a1, @2,@3 to an orthonormal basis 
q1;,q2;,q3. These are columns in A = QR. Show that R is the change of basis 
matrix from the a’s to the q’s (az is what combination of q’s when A = QR ?). 


Elimination changes the rows of A to the rows of U with A = LU. Row 2 of A is 
what combination of the rows of U? Writing AT = UTL"? to work with columns, 
the change of basis matrix is B = LT. We have bases if the matrices are __— 


Suppose v1, v2, V3 are eigenvectors for T. This means T (v;) = A;v; for i = 1, 2,3. 
What is the matrix for T' when the input and output bases are the v’s? 


Every invertible linear transformation can have I as its matrix! Choose any input 
basis U1,..., Un. For output basis choose w; = T (v;i). Why must T be invertible? 


Using vı = w , and v2 = wz find the standard matrix for these T”’s: 
(a) T(v,) = 0 and T(v2) = 301 (b) T(v1) =v and T(vı + v2) = vı. 


Suppose T reflects the xy plane across the x axis and S is reflection across the y 
axis. If v = (x, y) what is S(T (v))? Find a simpler description of the product ST. 


Suppose T is reflection across the 45° line, and S is reflection across the y axis. If 
v = (2,1) then T(v) = (1, 2). Find S(T(v)) and T(S(v)). Usually ST ATS. 


The product of two reflections is a rotation. Multiply these reflection matrices to 
find the rotation angle: 


cos 20 sin 20 cos 2a sin 2a 
sin20 —cos20 sin2a@ —cos2a 


Suppose A is a 3 by 4 matrix of rank r = 2, and T(v) = Av. Choose input basis 
vectors v1, V2 from the row space of A and v3, v4 from the nullspace. Choose 
output basis vectors wı = Avi, W2 = Ave in the column space and w3 from the 
nullspace of AT. What specially simple matrix represents T in these special bases? 


The space M of 2 by 2 matrices has the basis v1, V2,03,v4 in Worked 
Example 8.2 A. Suppose 7’ multiplies each matrix by a ae With w’s equal to 


v’s, what 4 by 4 matrix A represents this transformation T' on matrix space? 


True or False: If we know Tv) for n different nonzero vectors in R”, then we 
know T (v) for every vector v in R”. 
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8.3 The Search for a Good Basis 


1 With a new input basis Bin and output basis Bout, every matrix A becomes Bo ABin. 


2 Bin = Bout = “generalized eigenvectors of A” produces the Jordan form J = B-! AB. 


3 The Fourier matrix F = Bin = Bout diagonalizes every circulant matrix (use the FFT). 


4 Sines and cosines, Legendre and Chebyshev: those are great bases for function space. 


This is an important section of the book. I am afraid that most readers will skip it—or 
won’t get this far. The first chapters prepared the way by explaining the idea of a basis. 
Chapter 6 introduced the eigenvectors x and Chapter 7 found singular vectors v and u. 
Those are two winners but many other choices are very valuable. 

First comes the pure algebra from Section 8.2 and then come good bases. The input 
basis vectors will be the columns of Bin. The output basis vectors will be the columns of 
Bout. Always Bin and Bout are invertible—basis vectors are independent ! 


Pure algebra If A is the matrix for a transformation T in the standard basis, then 


B— AB, is the matrix in the new bases. (1) 


out 


The standard basis vectors are the columns of the identity: Bin = Inxn and Bout = Imm. 
Now we are choosing special bases to make the matrix clearer and simpler than A. 
When Bin = Bout = B, the square matrix B-' AB is similar to A. 


Applied algebra Applications are all about choosing good bases. Here are four 
important choices for vectors and three choices for functions. Eigenvectors and singular 
vectors led to A and È in Section 8.2. The Jordan form is new. 

1 Bn = Bou = eigenvector matrix X. Then X~1AX = eigenvalues in A. 
This choice requires A to be a square matrix with n independent eigenvectors. 
“A must be diagonalizable.’ We get A when Bin = Bout is the eigenvector matrix X. 

2 Bin = V and Bor = U : singular vectors of A. Then UTAV = diagonal 5. 


») is the singular value matrix (with o,,...,0, on its diagonal) when Bin and Bout 
are the singular vector matrices V and U. Recall that those columns of Bin and Bout 
are orthonormal eigenvectors of AT A and AAT. Then A = UXV? gives © = U-! AV. 
3 Bin = Bou = generalized eigenvectors of A. Then B~1AB = Jordan form J. 


A is a square matrix but it may only have s independent eigenvectors. (If s = n then 
B is X and J is A.) In all cases Jordan constructed n — s additional “generalized” 
eigenvectors, aiming to make the Jordan form J as diagonal as possible : 


i) There are s square blocks along the diagonal of J. 


ii) Each block has one eigenvalue A, one eigenvector, and 1’s above the diagonal. 


The good case has n 1 x 1 blocks, each containing an eigenvalue. Then J is A (diagonal). 
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Example 1 This Jordan matrix J has eigenvalues A = 2, 2,3, 3 (two double eigen- 
values). Those eigenvalues lie along the diagonal because J is triangular. There are two 
independenteigenvectors for A = 2, but there is only one line of eigenvectors for A = 3. 
This will be true for every matrix C = BJ B™? that is similar to J. 


2 Two 1 by 1 blocks 
P = 2 One 2 by 2 block 
Jordan mwN a 1 Three eigenvectors 
0 3 Eigenvalues 2, 2, 3,3 


Two eigenvectors for \ = 2 are x; = (1,0,0,0) and z2 = (0, 1,0,0). One eigenvector 
for \ = 3 is x3 = (0,0,1,0). The “generalized eigenvector” for this Jordan matrix is 
the fourth standard basis vector x4 = (0,0,0,1). The eigenvectors for J (normal and 
generalized) are just the columns £1, £2, £3, £4 of the identity matrix J. 


Notice (J — 3I)x4 = x3. The generalized eigenvector x, connects to the true 
eigenvector x3. A true x4 would have (J — 3I)x4 = 0, but that doesn’t happen here. 


Every matrix C = BJB™! that is similar to this J will have true eigenvectors 
bı, b2, bs in the first three columns of B. The fourth column of B will be a generalized 
eigenvector b4 of C', tied to the true bg. Here is a quick proof that uses Ba3 = bs 
and Bay = bg to show: The fourth column b; is tied to b3 by (C — 3I)b4 = bs. 


(BJB! — 3I) b4 = BJ x4 — 3B x4 = B(J — 3I) x4 = B x3 = b3. (2) 


The point of Jordan’s theorem is that every square matrix A has a complete set of 
eigenvectors and generalized eigenvectors. When those go into the columns of B, the 
matrix B~!AB = J is in Jordan form. Based on Example 1, here is a description of J. 


The Jordan Form 


For every A, we want to choose B so that B~!AB is as nearly diagonal as possi- 
ble. When A has a full set of n eigenvectors, they go into the columns of B. Then 
B = X. The matrix X —'AX is diagonal, period. This is the Jordan form of A—when A 
can be diagonalized. In the general case, eigenvectors are missing and A can’t be reached. 

Suppose A has s independent eigenvectors. Then it is similar to a Jordan matrix 
with s blocks. Each block has an eigenvalue on the diagonal with 1’s just above it. 
This block accounts for exactly one eigenvector of A. Then B contains generalized 
eigenvectors as well as ordinary eigenvectors. 


When there are n eigenvectors, all n blocks will be 1 by 1. In that case J = A. 


The Jordan form solves the differential equation dw/dt = Au for any square matrix 
A = BJB™. The solution e“‘w(0) becomes u(t) = Be7*B-!u(0). J is triangular 
and its matrix exponential e7* involves e% times powers 1,t,...,t°~!. 
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(Jordan form) If A has s independent eigenvectors, it is similar to a matrix J that has 
s Jordan blocks J; ..., Js on its diagonal. Some matrix B puts A into Jordan form: 


Jı 
Jordan form BTAB= ta =J. (3) 
Js 


Each block J; has one eigenvalue \;, one eigenvector, and 1’s just above the diagonal: 
Jordan block d= 


Matrices are similar if they share the same Jordan form J—not otherwise. 


The Jordan form J has an off-diagonal 1 for each missing eigenvector (and the 1’s 
are next to the eigenvalues). In every family of similar matrices, we are picking one 
outstanding member called J. It is nearly diagonal (or if possible completely diagonal). 
We can quickly solve du/dt = Ju and take powers J*. Every other matrix in the 
family has the form BJB™. 


Jordan’s Theorem is proved in my textbook Linear Algebra and Its Applications. 
Please refer to that book (or more advanced books) for the proof. The reasoning is 
rather intricate and in actual computations the Jordan form is not at all popular—its cal- 
culation is not stable. A slight change in A will separate the repeated eigenvalues and 
remove the off-diagonal 1’s—switching Jordan to a diagonal A. 


Proved or not, you have caught the central idea of similarity—to make A as simple as 
possible while preserving its essential properties. The best basis B gives B~'AB = J. 


Question Find the eigenvalues and all possible Jordan forms if A? = zero matrix. 
Answer The eigenvalues must all be zero, because Ax = Ag leads to A*a = Xx = On. 
The Jordan form of A has J? = 0 because J? = (B~'AB)(B-1AB) = BIA B = 0. 
Every block in J has À = 0 on the diagonal. Look at T for block sizes 1, 2,3: 


2 
2 0 1 0 001 
[0] =[0] Mant Shane 001] =/0 0 0 

00 0 0 0 0 


Conclusion: If J* = 0 then all block sizes must be 1 or 2. J? is not zero for 3 by 3. 
The rank of J (and A) will be the total number of 1’s. The maximum rank is n/2. 
This happens when there are n/2 blocks, each of size 2 and rank 1. 


(4) 
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Now come the great bases of applied mathematics. Their discrete forms are vectors 
in R”. Their continuous forms are functions in a function space. Since they are chosen 
once and for all, without knowing the matrix A, these bases Bin = Bout probably don’t 
diagonalize A. But for many important matrices A in applied mathematics, the matrices 
B~*AB are close to diagonal. 


4 Bin = Bow = Fourier matrix F Then Fz is a Discrete Fourier Transform of x. 


Those words are telling us: The Fourier matrix with columns (1, A, A”, \?) in equation 
(6) is important. Those are good basis vectors to work with. 


We ask: Which matrices are diagonalized by F? This time we are starting with the 
eigenvectors (1, A, A”, A3) and finding the matrices that have those eigenvectors : 


0100 1 1 
0 0 1 0 À À 
A = = 
If A = 1 then Pg= 0001 ene A TA Ag. (5) 
1 0 0 0 A3 x 
P is a permutation matrix. The equation Px = Az says that a is an eigenvector 


and À is an eigenvalue of P. Notice how the fourth row of this vector equation is 
1 = A+. That rule for À makes everything work. 


Does this give four different eigenvalues A? Yes. The four numbers ÀA = 1,2, —1, —2 
all satisfy àf = 1. (You know i? = —1. Squaring both sides gives it = 1.) So those 
four numbers are the eigenvalues of P, each with its eigenvector x = (1, A, A”, A3). i 
The eigenvector matrix F diagonalizes the permutation matrix P : 


1 Eigenvector | 1 1 1 1 

Eigenvalue a matrix is l1 i -1 —i : 
matrix A —1 Fourier bas Ian (6) i 
—i matrix F 1 #2 -1 (-7)3 | 


Those columns of F are orthogonal because they are eigenvectors of P (an orthogonal 

matrix). Unfortunately this Fourier matrix F is complex (it is the most important : 
complex matrix in the world). Multiplications f’a are done millions of times very ! 
quickly, by the Fast Fourier Transform. The FFT comes in Section 9.3. | 


Key question: What other matrices beyond P have this same eigenvector matrix F? | 
We know that P? and P3 and P* have the same eigenvectors as P. The same matrix F 
diagonalizes all powers of P. And the eigenvalues of P? and P? and P4 are the numbers 
à? and \? and \*. For example P?a2 = X? x: 


0010 1 1 
Pr= ; : : 5; = Se T = \*a when A4 = 1. l 
0100 3 3 
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The fourth power is special because P4 = I. When we do the “cyclic permutation” 
four times, P*z is the same vector x that we started with. The eigenvalues of Pt = I are 
just 1,1,1,1. And that number 1 agrees with the fourth power of all the eigenvalues of P : 
14 = 1 and i+ = 1 and (—1)* = 1 and (—7)* = 1. 

One more step brings in many more matrices. If P and P? and P? and P* = I have 
the same eigenvector matrix F’, so does any combination C = c1 P + c2 P? +c3P? +c 1: 


Co Cı C2 C3 | has eigenvectors in the Fourier matrix F 

C3 Co Cı C2 | has four eigenvalues cg + c1 À + CA” + eA? 
Ce C3 Co Cy from the four numbers \ = 1,2, —1, —2 

Ci Co C3 Co | The eigenvalue from A = 1 is co + c1 + C2 + C3 


ie) 


Circulant matrix C = 


That was a big step. We have found all the matrices (circulant matrices C’) whose 
eigenvectors are the Fourier vectors in F. We also know the four eigenvalues of C, but 
we haven’t given them a good formula or a name until now: 


The four eigenvalues of C 1 : l l = a a Ta a 
: 1 «4-1 —i Ci Co + ic, — Co — 163 

are given by the Fe= | = 7 ren 
Fourier transform F'c l ; = j a i a 
1 —i —1 a C3 Co — 4C1 — Co + 1C3 


Example 2 The same ideas work for a Fourier matrix F' and a circulant matrix C of 
any size. Two by two matrices look trivial but they are very useful. Now eigenvalues 
of P have à? = 1 instead of \* = 1 and the complex number i is not needed: A = +1. 


Fourier matrix F' from pe Cs | p= 01 Circulant _ | €o: Cx 
eigenvectors of P and C ee a | —]10| col+aP Cı Col 


The eigenvalues of C are co + c1 and co — c1. Those are given by the Fourier transform Fc 
when the vector c is (co, c1). This transform F'c gives the eigenvalues of C for any size n. 


Notice that circulant matrices have constant diagonals. The same number co goes 
down the main diagonal. The number cı is on the diagonal above, and that diagonal 
“wraps around” or “circles around” to the southwest corner of C'. This explains the name 
circulant and it indicates that these matrices are periodic or cyclic. Even the powers of A 
cycle around because A* = 1 leads to AŠ, A6, A7, A8 = A, A2, A’, A4. 

Constancy down the diagonals is a crucial property of C. It corresponds to constant 
coefficients in a differential equation. This is exactly when Fourier works perfectly! 


u ; 
The equation 7p = —u issolvedby u = cocost + cı sint. 
E i 
The equation A tu cannot be solved by elementary functions. 
These equations are linear. The first is the oscillation equation for a simple spring. 
It is Newton’s Law f = ma with mass m = 1, a = d?u/dt?, and force f = —u. 


Constant coefficients produce the differential equations that you can really solve. 
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The equation u” = tu has a variable coefficient t. This is Airy’s equation in physics 
and optics (it was derived to explain a rainbow). The solutions change completely when t 
passes through zero, and those solutions require infinite series. We won't go there. 

The point is that equations with constant coefficients have simple solutions like e**. 
You discover À by substituting e% into the differential equation. That number À is like 
an eigenvalue. For u = cost and u = sint the number is A = i. Euler’s great formula 
ett = cost + i sin t introduces complex numbers as we saw in the eigenvalues of P and C. 


Bases for Function Space 


For functions of x, the first basis I would think of contains the powers 1,z,27,2°,... 
Unfortunately this is a terrible basis. Those functions x” are just barely independent. 
x1? is almost a combination of other basis vectors 1,z,...,z°. It is virtually impossible 
to compute with this poor “ill-conditioned” basis. 

If we had vectors instead of functions, the test for a good basis would look at BTB. 
This matrix contains all inner products between the basis vectors (columns of B). 
The basis is orthonormal when BTB = I. That is best possible. But the basis 1, z, x,... 
produces the evil Hilbert matrix: BTB has an enormous ratio between its largest and 
smallest eigenvalues. A large condition number signals an unhappy choice of basis. 


Note Now the columns of B are functions instead of vectors. We still use BTB to 
test for independence. So we need to know the dot product (inner productis a better name) 
of two functions—those are the numbers in BT B. 

The dot product of vectors is just ey = x,y, +--+ £nYn. The inner product of 
functions will integrate instead of adding, but the idea is completely parallel : 


J f(x) g(x) da 
f f(x) g(x) dx, f = complex conjugate 


Inner product (f, g) 


| 


Complex inner product (f, g) 


Weighted inner product (f,g)w = f w(x) f(x) g(x) dz, w= weight function 


When the integrals go from x = 0 to x = 1, the inner product of zê with xf is 


ee giro ea! 
| oe ALE H = ———— = entries of Hilbert matrix BTB 
0 tag Pile tye! 
By changing to the symmetric interval from x = —1 to x = 1, we immediately have 


orthogonality between all even functions and all odd functions : 


1 1 
Interval [—1, 1] J r? zdz =0 J even(x)odd(x) dz = 0. 
Zi =i 
This change makes half of the basis functions orthogonal to the other half. It is so simple 
that we continue using the symmetric interval —1 to 1 (or —7 to 7). But we want a better 
basis than the powers x”—hopefully an orthogonal basis. 
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Orthogonal Bases for Function Space 


Here are the three leading even-odd bases for theoretical and numerical computations : 


5. The Fourier basis 1 Siig, cost, Sin 2%, cos 27... 
6. The Legendre basis UN ne 3) p= p ae 


7. The Chebyshev basis 1, x, 2x7 — 1, 4z? — 3z,... 


The Fourier basis functions (sines and cosines) are all periodic. They repeat over every 
27 interval because cos(x+27) = cos x and sin(x+27) = sin z. So this basis is especially 
good for functions f(x) that are themselves periodic: f(x + 27) = f(z). 

This basis is also orthogonal. Every sine and cosine is orthogonal to every other sine 
and cosine. Of course we don’t expect the basis function cos nz to be orthogonal to itself. 


Most important, the sine-cosine basis is also excellent for approximation. If we have 
a smooth periodic function f(z), then a few sines and cosines (low frequencies) are all 
we need. Jumps in f(z) and noise in the signal are seen in higher frequencies (larger 7). 
We hope and expect that the signal is not drowned by the noise. 

The Fourier transform connects f(x) to the coefficients a, and bẹ in its Fourier series : 


Fourier series f(z) = ao + bı sin g +a, cosg + bo sin 2x + ag cos 2g +- 


We see that function space is infinite-dimensional. It takes infinitely many basis func- 
tions to capture perfectly a typical f(x). But the formula for each coefficient (for example 
a3) is just like the formula bia /a* a for projecting a vector b onto the line through a. 

Here we are projecting the function f(z) onto the line in function space through cos 32: 


: : (f(x), cos 3x) | P(e) eos tdr 
F fficient = AL = S E, I 
N (cos 3z,cos 32) [cas 3r cos 3x dz ) 
Example 3 The double angle formula in trigonometry is cos 2x = 2 cos? x —1. This tells 


us that cos? x = 4 T 4 cos 2x. A very short Fourier series. So is sin? x = 4 = 4 COS 22; 


Fourier series is just linear algebra in function space. Let me explain that properly 
as a highlight of Chapter 10 about applications. 
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Legendre Polynomials and Chebyshev Polynomials 


The Legendre polynomials are the result of applying the Gram-Schmidt idea (Section 4.4). 
The plan is to orthogonalize the powers 1, x, x7,... To start, the odd function æ is already 
orthogonal to the even function 1 over the interval from —1 to 1. Their product (x)(1) = x 
integrates to zero. But the inner product between x? and 1 is f x* dz = 2/3: 


a 2dr 2/3 1 
= S = — = 25 = 3 Gram-Schmidt gives r? — 


w| = 


= Legendre 


Similarly the odd power zê has a component 32/5 in the direction of the odd function z : 


3 4 9 3 
(x", x) az J a = 2/5 — 2 Gram-Schmidt gives x? — —x = Legendre 
(ga) ™ frida 2/3 5 s 


5 


Continuing Gram-Schmidt for x*, x°, .. . produces every Legendre function—a good basis. 


Finally we turn to the Chebyshev polynomials 1, z, 2z? — 1, 4z? — 3z. They don’t 
come from Gram-Schmidt. Instead they are connected to 1, cos, cos 20, cos 30. This 
gives a giant computational advantage—we can use the Fast Fourier Transform. 
The connection of Chebyshev to Fourier appears when we set x = cos 0: 


Chebyshev 2z? —1 = 2(cos0)? — 1 = cos 20 
to Fourier 4z? — 3x = 4(cos 0)? — 3(cos 0) = cos 30 


The nt degree Chebyshev polynomial Tn (x) converts to Fourier’s cos nO = Tp (cos 0). 


Note These polynomials are the basis for a big software project called “chebfun”. 
Every function f(x) is replaced by a super-accurate Chebyshev approximation. Then you 
can integrate f(x), and solve f(x) = 0, and find its maximum or minimum. More than 
that, you can solve differential equations involving f (x)—fast and to high accuracy. 
When chebfun replaces f(x) by a polynomial, you are ready to solve problems. 


= REVIEW OF THE KEY IDEAS = 


1. A basis is good if its matrix B is well-conditioned. Orthogonal bases are best. 
. Also good if A = B~! AB is diagonal. But the Jordan form J can be very unstable. 
. The Fourier matrix diagonalizes constant-coefficient periodic equations: perfection. 


. The basis 1, z, x?,... leads to BT B = Hillbert matrix: Terrible for computations. 


an A & N 


. Legendre and Chebyshev polynomials are excellent bases for function space. 
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Problem Set 8.3 


1 


10 


In Example 1, what is the rank of J — 3I ? What is the dimension of its nullspace ? 
This dimension gives the number of independent eigenvectors for À = 3. 


The algebraic multiplicity is 2, because det (J — AT) has the repeated factor (A—3)?. 
The geometric multiplicity is 1, because there is only 1 independent eigenvector. 


These matrices A, and Ag are similar to J. Solve A; Bı = B,J and Ap Bə = BoJ 
to find the basis matrices Bı and Bg with J = PA Bı and J = By" AB. 


Oo 04 4 =8 
=Lo0} “= o0] æ= [2-4] 
This transpose block JT has the same triple eigenvalue 2 (with only one eigenvector) 
as J. Find the basis change B so that J = B71JT B (which means BJ = JT B): 


2 1 0 2 0 0 
J=] 0 2 1 J =|12 0 
00 2 0 i2 


J and K are Jordan forms with the same zero eigenvalues and the same rank 2. But 
show that no invertible B solves BK = JB, so K is not similar to J: 


0 1 0 0 0 1 

0 0 0 0 
0 1 
0 


If A? = 0 show that all A = 0, and all Jordan blocks with J? = 0 have size 1,2, or 
3. It follows that rank (A) < 2n/3. If A” = 0 why is rank (A) < n? 


teat du ; NIEI 0 
Show that u(t) = | s | solves ae Ju with J = F l and u(0) = p |. 
J is not diagonalizable so te% enters the solution. 
Show that the difference equation vps2 — 2Avp41 + A?UR = O is solved by 


vy, = A* and also by Uk = kà". Those correspond to eò% and te% in Problem 6. 


What are the 3 solutions to à? = 1? They are complex numbers A = cos §+i sin 0 = 
etf. Then à? = e3% = 1 when the angle 30 is 0 or 27 or 4r. Write the 3 by 3 Fourier 
matrix F with columns (1, A, A”). 


Check that any 3 by 3 circulant C has eigenvectors (1, A, A?) from Problem 8. 
If the diagonals of your matrix C contain cg, C1, C2 then its eigenvalues are in Fc. 


Iie=La 7 = L 


Using formula (7) find a3 cos 3z in the Fourier series of f(x) = f Ofor L< |e| < 2r 


Chapter 9 


Complex Vectors and Matrices 


(— w 


R = line of all real numbers —co < x < œo + C = plane of all complex numbers z = x + îy 


Real versus Complex 


|z| = absolute valueofx + |z| = \/x? + y? = r = absolute value (or modulus) of z 
l and —1 solve z? =1 © z=1,w,..., wW”! solve z” = 1 where w = e27/” 
1 = 
The complex conjugate of z = z + iyis Z = x — iy. |z|? = xz? +y? = zZ and = = ae 
2 2 


0 


The polar form of z = z + iyis |z|e? = ret? = r cos 0 + ir sin 0. The angle has tan 0 = A 
T 


R”: vectors with n real components +> C”: vectors with n complex components 
length: ||a||*? = 2? +---+22 4 length: |z|? = |z1|? +--+ |zn/? 
transpose: (A™);; = Aji < conjugate transpose: (A®),; = Aji 
dot product: ny = 21Y1 +++: +2nYn © inner product: uly = Tv +++ + Unt 
reason for AT: (Ax)Ty = xT(ATy) © reason for AH: (Aw)#u = u! (Aly) 
orthogonality: 'y =0 + orthogonality: uv = 0 
symmetric matrices: S = ST + Hermitian matrices: S = 9H 
S = QAQ! = QAQ" (real A) & S =UAU! = UAU" (real A) 
skew-symmetric matrices: KT = —K © skew-Hermitian matrices KH = —K 
orthogonal matrices: QT = Q7! © unitary matrices: UH = U7} 
orthonormal columns: QTQ = I 4 orthonormal columns: UĦU = J 
\ Qe)*(Qy) = 27y and ||Qa]| = ||æl| + (Uz)¥ (Uy) = z"y and |U2l| = |z| 


A complete presentation of linear algebra must include complex numbers z = x + ty. 
Even when the matrix is real, the eigenvalues and eigenvectors are often complex. 
Example: A 2 by 2 rotation matrix has complex eigenvectors x = (1,7) and Z = (1, —i). 
I will summarize Sections 9.1 and 9.2 in these few unforgettable words: When you 
transpose a vector v or a matrix A, take the conjugate of every entry (i changes to —2). 
Section 9.3 is about the most important complex matrix of all—the Fourier matrix F. 
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9.1 Complex Numbers 


Start with the imaginary number i. Everybody knows that x? = —1 has no real solution. 
When you square a real number, the answer is never negative. So the world has agreed on 
a solution called 7. (Except that electrical engineers call it 7.) Imaginary numbers follow 
the normal rules of addition and multiplication, with one difference. Replace i? by —1. 
This section gives the main facts about complex numbers. It is a review for some 
students and a reference for everyone. Everything comes from i? = —1 and e?™ = 1. 


A complex number (say 3 + 2i) is a real number (3) plus an imaginary number (2i). 
Addition keeps the real and imaginary parts separate. Multiplication uses i? = —1: 


Add: (3+ 2i)+ (3+ 21) =6+ 4: 
Multiply: (3 + 27)(1 — i) = 3 + 2i — 3i — 2i? = 5 — i. 


If I add 3 + 2 to 1 — 2, the answer is 4. The real numbers 3 + 1 stay separate from the 
imaginary numbers į — i. We are adding the vectors (3, 1) and (1, —1) to get (4, 0). 
The number (1 + i)? is 1 + i times 1 + i. The rules give the surprising answer 2i: 


(l+i(1+i)=14+i+i+? = 2i. 


In the complex plane, 1 +4 is at an angle of 45°. It is like the vector (1, 1). When we square 
1 +i to get 2i, the angle doubles to 90°. If we square again, the answer is (2i)? = —4. The 
90° angle doubled to 180°, the direction of a negative real number. 

A real number is just a complex number z = a + bi, with zero imaginary part: b = 0. 


The real partis a= Re (a + bi). The imaginary partis b = Im (a + bi). 


The Complex Plane 


Complex numbers correspond to points in a plane. Real numbers go along the x axis. Pure 
imaginary numbers are on the y axis. The complex number 3 + 2i is at the point with 
coordinates (3, 2). The number zero, which is 0 + 0i, is at the origin. 

Adding and subtracting complex numbers is like adding and subtracting vectors in the 
plane. The real component stays separate from the imaginary component. The vectors go 
head-to-tail as usual. The complex plane C! is like the ordinary two-dimensional plane R?, 
except that we multiply complex numbers and we didn’t multiply vectors. 

Now comes an important idea. The complex conjugate of 3 + 2i is 3 — 2i. The 
complex conjugate of z = 1 — ¿i is 7 = 1 + i. In general the conjugate of z = a+ bi is 
Z = a — bi. (Some writers use a “bar” on the number and others use a “star”: Z = z*.) 
The imaginary parts of z and “z bar” have opposite signs. In the complex plane, Z is the 
image of z on the other side of the real axis. 
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z=3+2i 
Complex 
plane /32 + 22 
Unit Real axis 
circle 


=21 


Conjugate Z = 3 — 2i 


Figure 9.1: The number z = a + bi corresponds to the point (a, b) and the vector [3] ; 


Two useful facts. When we multiply conjugates Zı and Z2, we get the conjugate of zı z2. 
And when we add Z; and Z2, we get the conjugate of z1 + 22: 


Z1 + Z2 = (3 — 2i) + (1 + i) = 4 — i. This is the conjugate of z1 + z2 = 4 + i. 
Zı X Z = (3 — 2i) x (1 +i) = 5 + i. This is the conjugate of z1 x z2 = 5 — 1. 


Adding and multiplying is exactly what linear algebra needs. By taking conjugates of 
Az = Ax, when A is real, we have another eigenvalue À and its eigenvector T: 


Eigenvalues X and X If Ax = Az and Ais real then Ax = Az. (1) 
Something special happens when z = 3 + 22 combines with its own complex conjugate 
z = 3 — 21. The result from adding z + Z or multiplying zZ is always real: 

z+Z=real (3 + 27) + (3 — 21) =6 (real) 
zz = real (3 + 2i) x (3 — 2i) = 9 + 6i — 6i — 4i? =13 (real). 


The sum of z = a + bi and its conjugate Z = a — bi is the real number 2a. The product of 
z times Z is the real number a? + b?: 


Multiply z times Z to get |z|? = r? (a + bi)(a — bi) = a? +b. (2) 


The next step with complex numbers is 1/z. How to divide by a+ ib? The best idea 
is to multiply first by Z/Z = 1. That produces zZ in the denominator, which is a? + b?: 


1 1 a—i Vigee 1 1 38-22% 3-2 


ee atiba—ib ee 329 3493-9 13 


In case a? +b? = 1, this says that (a +ib)~+ is a—ib. On the unit circle, 1 / z equals Z. 
Later we will say: 1/e* is e~*®. Use distance r and angle @ to multiply and divide. 
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The Polar Form re”? 
The square root of a? + b? is |z|. This is the absolute value (or modulus) of the number 
z = a + ib. The square root |z| is also written r, because it is the distance from 0 to z. The 
real number r in the polar form gives the size of the complex number z: 


The absolute value of z=a+ib is |z| = vVa?+b?. This is called r. 


The absolute value of z=3+2i is |z| = V/32+22. Thisisr = v13. 


The other part of the polar form is the angle 0. The angle for z = 5 is 0 = 0 (because this z 
is real and positive). The angle for z = 37 is 7/2 radians. The angle for a negative z = —9 
is m radians. The angle doubles when the number is squared. The polar form is excellent 
for multiplying complex numbers (not good for addition). 

When the distance is r and the angle is 0, trigonometry gives the other two sides of the 
triangle. The real part (along the bottom) is a = r cos 0. The imaginary part (up or down) 


is b = rsin@. Put those together, and the rectangular form becomes the polar form re’’. 


The number z =a + ib isalso z=rcos@+irsin@. Thisis ret? 


Note: cos 0 + isin 0 has absolute value r = 1 because cos? 0 + sin? 0 = 1. Thus 
cos + isin @ lies on the circle of radius 1—the unit circle. 


Example 1 Find r and @ for z = 1 + į and also for the conjugate Z = 1 — i. 


Solution The absolute value is the same for z and Z. It is r = y1 + 1 = /2: 
jzi?=174+17=2 andalso Bade (0). 


The distance from the center is r = v2. What about the angle 0? The number 1 + i 
is at the point (1, 1) in the complex plane. The angle to that point is 7/4 radians or 45°. 
The cosine is 1/2 and the sine is 1/2. Combining r and 0 brings back z = 1 +12: 


rcosé + irsin 0 = va(5) +iv2 (=) =1+4, 


The angle to the conjugate 1 — 7 can be positive or negative. We can go to 77/4 radians 
which is 315°. Or we can go backwards through a negative angle, to —7/4 radians or 
—4A5°. If z is at angle 0, its conjugate Z is at 2x — 0 and also at —90. 

We can freely add 27 or 4r or —27 to any angle! Those go full circles so the final point 
is the same. This explains why there are infinitely many choices of 0. Often we select the 


angle between 0 and 27. But —@ is very useful for the conjugate Z. And 1 = e? = e°", 
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Powers and Products: Polar Form 


Computing (1 + i)? and (1 + 2)8 is quickest in polar form. That form has r = v2 and 
6 = 7/4 (or 45°). If we square the absolute value to get r? = 2, and double the angle to 
get 20 = 7/2 (or 90°), we have (1 + 7)?. For the eighth power we need rè and 80: 


(1 +i) rê =2-2-2-2=16 and 80 =8- 7 = 2r. 


This means: (1 + i)? has absolute value 16 and angle 27. So (1 + i)® = 16. 
Powers are easy in polar form. So is multiplication of complex numbers. 


The nth powerof z = r(cos0 + isinĝ) is z” = r”(cosnð + isinnð). (3) 


In that case z multiplies itself. To multiply z times z’, multiply r’s and add angles: 

r(cos@ + isin @) times r’(cos 6’ + isin 6’) = rr’(cos(@ + 6’) +isin(@+’)). (4) 

One way to understand this is by trigonometry. Why do we get the double angle 26 for z2? 
(cos@ + isin@) x (cos@ + isin) = cos? 6 + i? sin? 0 + 2isin@ cos 6. 


The real part cos? 6 — sin? 0 is cos 20. The imaginary part 2 sin 0 cos 6 is sin20. Those are 
the “double angle” formulas. They show that 6 in z becomes 26 in 22. 

There is a second way to understand the rule for z”. It uses the only amazing formula 
in this section. Remember that cos 0 + i sin@ has absolute value 1. The cosine is made up 
of even powers, starting with 1 — 50°. The sine is made up of odd powers, starting with 
0 — z0°. The beautiful fact is that e? combines both of those series into cos 0 + isin0: 


1 1 1 1 
P a a ae becomes mipi + are os 


Write —1 for i? to see 1 — 50°. The complex number e*® is cos O + isin 8: 


Euler’s Formula et? = cos 0 + isin@ gives z = r cos 0 + ir sin 0 = ret? (5) 


The special choice 0 = 27 gives cos 27 + i sin2r which is 1. Somehow the infinite series 
e?Ti = 1 + 2ri + (27i)? +--+ adds up to 1. 
Now multiply et? times a Angles add for the same reason that exponents add: 


: s : z a P ; E “al, . f 
e° times e° is e° e” times e” is e”™” e? times e is ete) 
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The powers (re’’)” are equal to r%e*"®. They stay on the unit circle when r = 1 
and r” = 1. Then we find n different numbers whose nth powers equal 1: 


Set w = e2™*/", The nth powers of 1, w, w?,...,w”—! all equal 1. 


Those are the “nth roots of 1.” They solve the equation z” = 1. They are equally spaced 
around the unit circle in Figure 9.2b, where the full 27 is divided by n. Multiply their 
angles by n to take nth powers. That gives w” = e?** which is 1. Also (w*)” = e47* = 1. 
Each of those numbers, to the nth power, comes around the unit circle to 1. 

These n roots of 1 are the key numbers for signal processing. The Discrete Fourier 
Transform uses w = e27*/” and its powers. Section 9.3 shows how to decompose a vector 
(a signal) into n frequencies by the Fast Fourier Transform. 


6 solutions to zê = 1 


ei (0+0) gi2ni/6 — p2ni =] 


Add the angles 0 + 0’ 


e87i/6 e107i/6 


Figure 9.2: (a) e times e* is e+"), (b) The nth power of e?™t/” is e27 = 1, 


= REVIEW OF THE KEY IDEAS = 


1. Adding a + ib to c + id is like adding (a,b) + (c, d). Use i? = —1 to multiply. 


. The conjugate of z = a+ bi = re” is z= z* =a—bi=re—, 


. z times zis re” times re’. This is r? = |z|? = a? + b? (real). 


>_> Ww N 


. Powers and products are easy in polar form z = re*’. Multiply r’s and add 6’s. 
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Problem Set 9.1 


Questions 1-8 are about operations on complex numbers. 


1 Add and multiply each pair of complex numbers: 

(a) 2+17,2-1 (b) -—1+2,-1+2 (c) cos@+isin6,cos@ — isin 
2 Locate these points on the complex plane. Simplify them if necessary: 

(a) 2+i () (2+)? © Sy @ [247 


3 Find the absolute value r = |z| of these four numbers. If 0 is the angle for 6 — 8i, 
what are the angles for the other three numbers? 


(a) 6-8 (b) (6-8)? (© gee @W (648i) 


4 = If |z| = 2 and |w| = 3 then |z x w| = and |z + w| < 
and |z — w| < 


and |z/w| = 


5 Find a + ib for the numbers at angles 30°, 60°, 90°, 120° on the unit circle. If w is 
the number at 30°, check that w°? is at 60°. What power of w equals 1? 


6 If z = rcos@ + irsin@ then 1/z has absolute value 
polar form is . Multiply z x 1/z to get 1. 


and angle . Its 


7 The complex multiplication M = (a + bi)(c + di) is a 2 by 2 real multiplication 


balal | 


The right side contains the real and imaginary parts of M. Test M = (1+3i)(1—3i). 


8 A = A, + iA is a complex n by n matrix and b = bı + ibz is a complex vector. 
The solution to Ax = bis xı + ia. Write Ax = b as a real system of size 2n: 
Complex n by n zı| | by 
Real 2n by 2n xo| | bo] 
Questions 9-16 are about the conjugate Z = a — ib = re™ t? = 2*. 
9 Write down the complex conjugate of each number by changing 2 to —2: 


(a)2-i w) (2-a)(1-i) (©) e7 (whichis) 
+i 


(d) eT==1 (e) iti (which is also i) (Ð g= 


10 The sum z + Z is always . The difference z — Z is always . Assume 
z #0. The product z x Z is always . The ratio z/Z has absolute value 


9.1. Complex Numbers 437 


11 For a real matrix, the conjugate of Ax = Ax is AX = Az. This proves two things: 
A is another eigenvalue and g is its eigenvector. Find the eigenvalues A, À and eigen- 
vectors x, tof A = [a b; —b al. 


12 The eigenvalues of areal 2 by 2 matrix come from the quadratic formula: 


a— À b 
det | 2 go) =? -e+ aA+ lad- be) =0 


gives the two eigenvalues \ = fa +d+,/(a+d)? — 4(ad — bo)| {2 


(a) Ifa = b = d = 1, the eigenvalues are complex when c is 
(b) What are the eigenvalues when ad = bc? 


13 In Problem 12 the eigenvalues are not real when (trace)? = (a + d)? is smaller than 
. Show that the ’s are real when bc > 0. 


14 A real skew-symmetric matrix (AT = —A) has pure imaginary eigenvalues. First 
proof: If Ax = Ag then block multiplication gives 


-4 ol [a] |e) 


This block matrix is symmetric. Its eigenvalues must be ____—! SoAis 
Questions 15-22 are about the form ret? of the complex number r cos 0 + ir sin 0. 
15 Write these numbers in Euler’s form re’’. Then square each number: 

(a) 14+ V3i (b) cos20 + isin20 (c) —% (d) 5-—5i. 


16 (A favorite) Find the absolute value and the angle for z = sin 0 + i cos 0 (careful). 
Locate this z in the complex plane. Multiply z by cos 0 + isin @ to get . 


17 Draw all eight solutions of zê = 1 in the complex plane. What is the rectangular 
form a + ib of the root z = W = exp(—2ri/8)? 


18 Locate the cube roots of 1 in the complex plane. Locate the cube roots of —1. To- 
gether these are the sixth roots of 


19 By comparing e%*® = cos30 + isin30 with (e*®)’ = (cos@ + isin6)°, find the 
“triple angle” formulas for cos 30 and sin 30 in terms of cos @ and sin 8. 


20 Suppose the conjugate Z is equal to the reciprocal 1/z. What are all possible z’s? 


21 (a) Why do et and i£ both have absolute value 1? 
(b) In the complex plane put stars near the points e and iê. 
(c) The number iê could be (et7/2)€ or (e5tT/2)€. Are those equal? 
22 Draw the paths of these numbers from t = 0 to t = 27 in the complex plane: 


(a) eit (b) a= +i _ eteit (c) (1) = etri, 
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9.2 Hermitian and Unitary Matrices 


The main message of this section can be presented in one sentence: When you transpose a 
complex vector z or matrix A, take the complex conjugate too. Don’t stop at z? or AT. 
Reverse the signs of all imaginary parts. From a column vector with z; = a; + ibj, 
the good row vector ZT is the conjugate transpose with components ig 1G: 


Conjugate transpose Z" = |Z; --- Zn] =[a1—%b1 «++ an— ibn]. (1) 
Here is one reason to go to Z. The length squared of a real vector is £? + --- + x2. The 
length squared of a complex vector is not z? + --- + 22. With that wrong definition, the 
length of (1,1) would be 1? + i? = 0. A nonzero vector would have zero length—not 
good. Other vectors would have complex lengths. Instead of (a + bi)? we want a? + b?, 
the absolute value squared. This is (a + bi) times (a — bi). 

For each component we want z; times Z;, which is |z;|? = a? + bZ. That comes when 
the components of z multiply the components of Z: 


21 
Length ;_ = ot eee 2 a z 
squared [ Zn) |: | =al +-+ izn. Thisis Z*z = ||z|l’. (2) 
Zn 


Now the squared length of (1,7) is 1? + |i|? = 2. The length is v2. The squared length of 
(1 + 72,1 —7) is 4. The only vectors with zero length are zero vectors. 


The length ||z|| is the square rootof zz = z"z=|z|? +--+ + |zn/? 


Before going further we replace two symbols by one symbol. Instead of a bar for the 
conjugate and T for the transpose, we just use a superscript H. Thus ZT = zĦ. This is 
“z Hermitian,” the conjugate transpose of z. The new word is pronounced “Hermeeshan.” 
The new symbol applies also to matrices: The conjugate transpose of a matrix A is AM. 


Another popular notation is A*. The MATLAB transpose command ' automatically 
takes complex conjugates (z’ is zH = ZT and A’ is AH = A ): 


Fee ee ese _// a 1 a H _ 1 0 
A” is “A Hermitian” If a= |i ne then A’ = ee 


Complex Inner Products 


For real vectors, the length squared is x! a—the inner product of x with itself. For 
complex vectors, the length squared is zz. It will be very desirable if zz is the in- 
ner product of z with itself. To make that happen, the complex inner product should use 
the conjugate transpose (not just the transpose). This has no effect on real vectors. 
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DEFINITION The inner product of real or complex vectors u and v is utv: 
Vy 


uty == [Uy tee Un | : | = UU +++ + Unn. (3) 


With complex vectors, u™v is different from v! u. The order of the vectors is now impor- 


tant. In fact vw = 0, u, +--+: + Unun is the complex conjugate of ully. We have to put 
up with a few inconveniences for the greater good. 


Example 1 The inner product of u = H with v = R is[1 —i] H = 0. 


Example 1 is surprising. Those vectors (1,2) and (2,1) don’t look perpendicular. But 
they are. A zero inner product still means that the (complex) vectors are orthogonal. 
Similarly the vector (1,7) is orthogonal to the vector (1, —7). Their inner product is 1 — 1. 
We are correctly getting zero for the inner product—where we would be incorrectly 
getting zero for the length of (1,7) if we forgot to take the conjugate. 


Note We have chosen to conjugate the first vector u. Some authors choose the second 
vector v. Their complex inner product would be wu? 9. I think it is a free choice. 


The inner product of Au with v equals the inner product of u with A®v: 
A® is also called the “adjoint” of A (Au)#u = uF (AFv). (4) 


The conjugate of Au is Au. Transposing Aw gives uTA” as usual. This is u! A", 
Everything that should work, does work. The rule for # comes from the rule for T. 
We constantly use the fact that (a — ib)(c — id) is the conjugate of (a + 1b)(c + id). 


The conjugate transpose of ABis (AB)® = BUA. 


Hermitian Matrices S = S# 


Among real matrices, symmetric matrices form the most important special class: S = ST. 
They have real eigenvalues and the orthogonal eigenvectors in an orthogonal matrix Q. 
Every real symmetric matrix can be written as S = QAQ7! and also as S = QAQT™ 
(because QT! = OF). All this follows from ST = S, when S is real. 
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Among complex matrices, the special class contains the Hermitian matrices: 
S = S". The condition on the entries is s;; = Sj. In this case we say that “S is 
Hermitian.” Every real symmetric matrix is Hermitian, because taking its conjugate has 
no effect. The next matrix is also Hermitian, S = S": 


2 Pi in di i ii = Sii- 
Example2 S= | 3 | The main diagonal must be real since s 3 


3+ 32 5 Across it are conjugates 3 + 37 and 3 — 31. 


This example will illustrate the three crucial properties of all Hermitian matrices. 
If S = S™ and z is any real or complex column vector, the number z# Sz is real. 


Quick proof: zĦ Sz is certainly 1 by 1. Take its conjugate transpose: 
("S2 =2 S (2")" whichise Sz again. 
So the number z! Sz equals its conjugate and must be real. Here is that “energy” zĦ Sz: 
[z 5 2 3-31 Zi = 22i z1 + 5Z222 + (3 = 3i)Z122 + (3 + Bt) 2129: 
GNE 5 Z2 diagonal off-diagonal 


The terms 2|z;|? and 5|z2|? from the diagonal are both real. The off-diagonal terms are 
conjugates of each other—so their sum is real. (The imaginary parts cancel when we add.) 
The whole expression zĦ Sz is real, and this will make A real. 


Every eigenvalue of a Hermitian matrix is real. 


Proof Suppose Sz = Xz. Multiply both sides by z” to get z# Sz = \z™z. On the left 
side, zĦSz is real. On the right side, zz is the length squared, real and positive. So the 
ratio A = z4S'z/z"z is a real number. Q.E.D. 


The example above has eigenvalues \ = 8 and \ = —1, real because S = S! : 


A 3— 3i 


N = +12 
343) sa =A ~TAt1O—[3 +38 


=)? = 7A+ 10-18 =(A=8)(A4 1). 


The eigenvectors of a Hermitian matrix are orthogonal (when they correspond to 


different eigenvalues). If Sz = Az and Sy = By then y¥z = 0. 


Proof Multiply Sz = àz on the left by yĦ. Multiply y¥ SE = By” on the right by z: 
y'Sz=Ay4z and y#S"z = By*z. (5) 


The left sides are equal so Ay#z = By"z. Then y"! z must be zero. 
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The eigenvectors are orthogonal in our example with \ = 8 and 6 = —1: 


a a eae E] H = a eae id 
seeds lE- = ofS] 


7 H . 1 
Orthogonal eigenvectors y“z=|1+i —1] p P l =Q. 

These eigenvectors have squared length 1? + 1? + 1? = 3. After division by v3 they are 
unit vectors. They were orthogonal, now they are orthonormal. They go into the columns 
of the eigenvector matrix X , which diagonalizes S. 

When S is real and symmetric, X is Q—an orthogonal matrix. Now S is complex and 
Hermitian. Its eigenvectors are complex and orthonormal. The eigenvector matrix X is 
like Q, but complex: QĦQ = I. We assign Q a new name “unitary” but still call it Q. 


Unitary Matrices 


A unitary matrix Q is a (complex) square matrix that has orthonormal columns. 


Unitary matrix that diagonalizes S: Q= = Fi ‘i i z] 
This Q is also a Hermitian matrix. I didn’t expect that! The example is almost too perfect. 
We will see that the eigenvalues of this Q must be 1 and —1. 
The matrix test for real orthonormal columns was QTQ = I. The zero inner prod- 
ucts appear off the diagonal. In the complex case, QT becomes QĦ. The columns show 
themselves as orthonormal when QĦ multiplies Q. The inner products fill up QĦQ = I: 


Every matrix Q with orthonormal columns has QĦQ = I. 


If Q is square, it is a unitary matrix. Then QU = Q-?. 


Suppose Q (with orthonormal columns) multiplies any z. The vector length stays the 
same, because zHQĦQz = z"™z. If z is an eigenvector of Q we learn something more: 
The eigenvalues of unitary (and orthogonal) matrices Q all have absolute value || = 1. 


If Q is unitary then ||Qz|| = ||z||. Therefore Qz = Az leads to |r| = 1. 


Our 2 by 2 example is both Hermitian (Q = Q") and unitary (Q7' = Q#). That 
means real eigenvalues and it means |\| = 1. A real number with |A| = 1 has only two 
possibilities: The eigenvalues are 1 or —1. The trace of Q is zero so A= landA = —1. 
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Example 3 The 3 by 3 Fourier matrix is in Figure 9.3. Is it Hermitian? Is it uni- 
tary? Fs is certainly symmetric. It equals its transpose. But it doesn’t equal its conjugate 
transpose—it is not Hermitian. If you change i to —i, you get a different matrix. 


e2ni/3 
1 l 1 
Fourier _ A į eil girih 
1 matrix als i 
1 e4ni/3 e2ni/3 
e4Til3 


Figure 9.3: The cube roots of 1 go into the Fourier matrix F = Fs. 


Is F unitary? Yes. The squared length of every column is (1 + 1 + 1) (unit vector). 
The first column is orthogonal to the second column because 1 + e?7*/3 + 47/3 = 0. 
This is the sum of the three numbers marked in Figure 9.3. 

Notice the symmetry of the figure. If you rotate it by 120°, the three points are in the 
same position. Therefore their sum S also stays in the same position! The only possible 
sum in the same position after 120° rotation is S = 0. 

Is column 2 of F orthogonal to column 3? Their dot product looks like 


3(1+ ou ets) = a(1+1+1). 


This is not zero. The answer is wrong because we forgot to take complex conjugates. The 
complex inner product uses # not T: 


ere. en 274/3 Ani/3 e— 4ri/3 92ni/3) 


(is e2mi/3 4 e~ 27/3) 2i 


(column 2)" (column 3) = 


Wl wir 


So we do have orthogonality. Conclusion: F is a unitary matrix. 

The next section will study the n by n Fourier matrices. Among all complex unitary 
matrices, these are the most important. When we multiply a vector by F’, we are comput- 
ing its Discrete Fourier Transform. When we multiply by F71, we are computing the 
inverse transform. The special property of unitary matrices is that F71 = FH, The inverse 
transform only differs by changing t to —t: 


1 1 1 1 
Change 7 to —2 F-1 = FH = |1 e72i/3 e—4ri/3 
V3 1 en4m/3 e~2ri/3 


Everyone who works with F recognizes its value. The last section of this chapter will bring 
together Fourier analysis and complex numbers and linear algebra. 
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Problem Set 9.2 


1 


10 


11 


Find the lengths of u = (1 + i, 1 —i,1+ 2i) and v = (i, i, i). Find u"v and vtu. 


Compute AĦ A and AA#. Those are both matrices: 


Solve Az = 0 to find a vector z in the nullspace of A in Problem 2. Show that z is 
orthogonal to the columns of AĦ. Show that z is not orthogonal to the columns of 
AT. The good row space is no longer C( AT). Now it is C(AĦ). 


Problem 3 indicates that the four fundamental subspaces are C(A) and N(A) and 
and . Their dimensions are still r and n — r and r and m — r. They are 
still orthogonal subspaces. The symbol © takes the place of T. 


(a) Prove that AĦ A is always a Hermitian matrix. 


(b) If Az = O then AM Az = 0. If A* Az = 0, multiply by zĦ to prove that 
Az = 0. The nullspaces of A and AMA are . Therefore AA is an 
invertible Hermitian matrix when the nullspace of A contains only z = 0. 


True or false (give a reason if true or a counterexample if false): 


(a) If Ais areal matrix then A + iT is invertible. 
(b) If S is a Hermitian matrix then S + 7J is invertible. 


(c) If Q is a unitary matrix then Q + if is invertible. 


When you multiply a Hermitian matrix by a real number c, is cS still Hermitian? 
Show that ¿iS is skew-Hermitian when S' is Hermitian. The 3 by 3 Hermitian matrices 
are a subspace provided the “scalars” are real numbers. 


Which classes of matrices does P belong to: invertible, Hermitian, unitary? 


0 
P= 0 
a 


DO O Ss 


0 
a 
0 


Compute P?, P3, and P!°°. What are the eigenvalues of P? 


Find the unit eigenvectors of P in Problem 8, and put them into the columns of a 
unitary matrix Q. What property of P makes these eigenvectors orthogonal? 


Write down the 3 by 3 circulant matrix C = 27 + 5P. It has the same eigenvectors 
as P in Problem 8. Find its eigenvalues. 


If Q and U are unitary matrices, show that QT! is unitary and also QU is unitary. 
Start from QUQ = I and UĦU =T. 
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12 
13 


14 


15 


16 


17 


18 


19 
20 
21 
22 
23 
24 


25 


26 
27 
28 
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How do you know that the determinant of every Hermitian matrix is real? 


The matrix AĦA is not only Hermitian but also positive definite, when the columns 
of A are independent. Proof: zH A" Az is positive if z is nonzero because : 


Diagonalize these Hermitian matrices to reach S = QAQ#: 


_[ 0 1-4 _[ 2 143 
$2150, i? a e i 


Diagonalize this skew-Hermitian matrix to reach K = QAQE. All A’s are 


ics oe = 
1+4% 1 
Diagonalize this orthogonal matrix to reach U = QAQ#. Now all A’s are 


lee — a 


sin 6 cos 6 


Diagonalize this unitary matrix to reach U = QAQ#. Again all A’s are _ 


1 1 1-1 
= V3 f +i -1 | 
If v1,...,Un is an orthonormal basis for C”, the matrix with those columns is a 


matrix. Show that any vector z equals (viiz)v; +--+ (vEz)un. 
v = (1,1,1), w = (i, 1,0) and z = _____ are an orthogonal basis for _____ 
If S = A + iB is a Hermitian matrix, are its real and imaginary parts symmetric? 
The (complex) dimension of C” is ___. Find a non-real basis for C”. 
Describe all 1 by 1 and 2 by 2 Hermitian matrices and unitary matrices. 
How are the eigenvalues of A” related to the eigenvalues of the square matrix A? 


If wu = 1 show that J —2uu! is Hermitian and also unitary. The rank-one matrix 
uu! is the projection onto what line in C”? 


If A + iB is a unitary matrix (A and B are real) show that Q = la ed is an 


orthogonal matrix. 
If A + iB is Hermitian (A and B are real) show that le aa is symmetric. 
Prove that the inverse of a Hermitian matrix is also Hermitian (transpose S~1S' = T). 


A matrix with orthonormal eigenvectors has the form N = QAQ! = QAQE. 
Prove that NN® = NEN. These N are exactly the normal matrices. Examples 
are Hermitian, skew-Hermitian, and unitary matrices. Construct a 2 by 2 normal 
matrix from QAQE# by choosing complex eigenvalues in A. 


| 
- 
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9.3 The Fast Fourier Transform 


Many applications of linear algebra take time to develop. It is not easy to explain them 
in an hour. The teacher and the author must choose between completing the theory and 
adding new applications. Often the theory wins, but this section is an exception. It explains 
the most valuable numerical algorithm in the last century. 

We want to multiply quickly by F and F—1, the Fourier matrix and its inverse. This 
is achieved by the Fast Fourier Transform. An ordinary product Fc uses n? multiplications 
(F has n? entries). The FFT needs only n times i log, n. We will see how. 

The FFT has revolutionized signal processing. Whole industries are speeded up by this 
one idea. Electrical engineers are the first to know the difference—they take your Fourier 
transform as they meet you (if you are a function). Fourier’s idea is to represent f as a 
sum of harmonics c,e’**. The function is seen in frequency space through the coefficients 
Cr, instead of physical space through its values f(x). The passage backward and forward 
between c’s and f’s is by the Fourier transform. Fast passage is by the FFT. 


Roots of Unity and the Fourier Matrix 


Quadratic equations have two roots (or one repeated root). Equations of degree n have n 
roots (counting repetitions). This is the Fundamental Theorem of Algebra, and to make it 
true we must allow complex roots. This section is about the very special equation z” = 1. 
The solutions z are the “nth roots of unity.’ They are n evenly spaced points around the 
unit circle in the complex plane. 

Figure 9.4 shows the eight solutions to z8 = 1. Their spacing is 3(360°) = 45°. The 
first root is at 45° or 9 = 27/8 radians. It is the complex number w = e”? = e%?7/8, 


We call this number wg to emphasize that it is an 8th root. You could write it in terms 
of cos 2r and sin ar, but don’t do it. The seven other 8th roots are w?, w®,..., w8, going 
around the circle. Powers of w are best in polar form, because we work only with the angles 


m = es 167 = 2r. Those 8 angles in degrees are 45°, 90°, 135°,..., 360°. 


. 2 2 
w = e?ri/8 = cos +i sin = 


Real axis 


Figure 9.4: The eight solutions to 28 = 1 are 1, w, w?, ..., w7 with w = (1 + i)/ v2. 
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The fourth roots of 1 are also in the figure. They are 2,—1,—7,1. The angle is now 
27/4 or 90°. The first root w4 = e?Ti/4 is nothing but i. Even the square roots of 1 
are seen, with wọ = e?27/2 — —1. Do not despise those square roots 1 and —1. The 
idea behind the FFT is to go from an 8 by 8 Fourier matrix (containing powers of wg) 
to the 4 by 4 matrix below (with powers of w4 = i). The same idea goes from 4 to 2. 
By exploiting the connections of Fg down to F4 and up to Fie (and beyond), the FFT 
makes multiplication by F924 very quick. 

We describe the Fourier matrix, first for n = 4. Its rows contain powers of 1 and w and 
w? and w°. These are the fourth roots of 1, and their powers come in a special order. 


Fourier it it 1 1 I1 1 1 1 
matrix rpi|i Y w wj 1 i ? # 
n=4 ~ 11 w wt we)” |1 È it iô 
w=i 1 w wÊ w? 1 i 7 4 


The matrix is symmetric (F = FT). It is not Hermitian. Its main diagonal is not real. But 
$F is a unitary matrix, which means that (+ F") (F) = I: 


The columns of F give FF = 4I. Its inverse is 4 F¥ whichis F+ = iF. 


The inverse changes from w = i to W = —i. That takes us from F to F. When the Fast 
Fourier Transform gives a quick way to multiply by F, it does the same for F and F~?. 

Every column has length yn. So the unitary matrices are Q = F/,/n and Q7! = 
F'/,/n. We avoid y/n and just use F and F71 = F'/n. The main point is to multiply F 
times Co, C1, C2, C3: 


4-point Yo f . ee a 
Fourier “n |= Fes La au au os : (1) 
series ve 3 6 +. 9 : 

Y3 1 w w w C3 


The input is four complex coefficients co, c1, C2, C3. The output is four function values 
Yo, Y1, Y2, Y3. The first output yo = co + cı + Co + c3 is the value of the Fourier series 
© cpe**” at x = 0. The second output is the value of that series Y` cpet? at x = 21/4: 


yy = Co +c 6°27/4 rad + cg6°7/4 = og +epw + cow* + cgw’. 


The third and fourth outputs y> and y3 are the values of 5° cp k= at g = 4r /4 and 
x = 67/4. These are finite Fourier series! They contain n = 4 terms and they are evaluated 
at n = 4 points. Those points x = 0, 2r /4, 47/4, 67/4 are equally spaced. 

The next point would be x = 87/4 which is 27. Then the series is back to yo, because 
e?™* is the same as e? = 1. Everything cycles around with period 4. In this world 2 + 2 is 0 
because (w?) (w?) = w? = 1. We follow the convention that j and k go from 0 ton — 1 
(instead of 1 to n). The “zeroth row” and “zeroth column” of F contain all ones. 
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The n by n Fourier matrix contains powers of w = e27/": 


Fic=|1 w? wt x wei) c |=| v |=y (2) 
1 wm! w) y wY Cn—1 Yn-1 


Fn is symmetric but not Hermitian. Its columns are orthogonal, and FF, = nI. Then 
F~! is F,,/n. The inverse contains powers of Wn = e~27*/", Look at the pattern in F: 


The entry in row j, column k is w3". Row zero and column zero contain w? = 1. 


When we multiply c by Fn, we sum the series at n points. When we multiply y by Fzt, we 
find the coefficients c from the function values y. In MATLAB that command is c = fft(y). 
The matrix F passes from “frequency space” to “physical space.” 


Important note. Many authors prefer to work with w = e~27'/", which is the complex 


conjugate of our w. (They often use the Greek omega, and I will do that to keep the two 
options separate.) With this choice, their DFT matrix contains powers of w not w. It is F, 
the conjugate of our F. F goes from physical space to frequency space. 

F is a completely reasonable choice! MATLAB uses w = e~27/".. The DFT matrix 
fft(eye(V)) contains powers of this number w = w. The Fourier matrix F with w’s 
reconstructs y from c. The matrix F with w’s computes Fourier coefficients as fft(y). 


Also important. When a function f(x) has period 27, and we change x to e”, 


the function is defined around the unit circle (where z = et). The Discrete Fourier 
Transform is the same as interpolation. Find the polynomial p(z) = co + cız + +- + 
Cn—12"! that matches n values fo,---, fn—1: 


Interpolation Find co,...,Cn_—1 so that p(z) = f at n points z = 1,..., w”! 


The Fourier matrix is the Vandermonde matrix for interpolation at those n special points. 


One Step of the Fast Fourier Transform 


We want to multiply F times c as quickly as possible. Normally a matrix times a vector 
takes n? separate multiplications—the matrix has n? entries. You might think it is impos- 
sible to do better. (If the matrix has zero entries then multiplications can be skipped. But 
the Fourier matrix has no zeros!) By using the special pattern w* for its entries, F can be 
factored in a way that produces many zeros. This is the FFT. 

The key idea is to connect F, with the half-size Fourier matrix F,,;2. Assume that 
n is a power of 2 (say n = 210 — 1024). We will connect F924 to two copies of F512. 
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When n = 4, the key is in the relation between F4 and two copies of Fb : 


i (ie es ae | i. i 
Ta @ # F» 1 7 
a= |] 2 4 46 ae |= i j 

1 i i 7 1 ?? 


On the left is F4, with no zeros. On the right is a matrix that is half zero. The work 
is cut in half. But wait, those matrices are not the same. We need two sparse and simple 
matrices to complete the FFT factorization: 


1 1 1 1 1 
Factors i i| |1 # 1 
forFFT “7 @) 
1 —i 1 # 1 

The last matrix is a permutation. It puts the even c’s (co and c2) ahead of the odd c’s (c1 
and c3). The middle matrix performs half-size transforms F> and F, on the even c’s and 
odd c’s. The matrix at the left combines the two half-size outputs—in a way that produces 
the correct full-size output y = F4c. 

The same idea applies when n = 1024 and m = in = 512. The number w is e 
It is at the angle 0 = 27/1024 on the unit circle. The Fourier matrix F3024 is full of powers 
of w. The first stage of the FFT is the great factorization discovered by Cooley and Tukey 


(and foreshadowed in 1805 by Gauss): 


2ri/1024 


F = [Ø Ds| [Fs12 even-odd (4) 
o \Is1o —Dsio Fsi2| | permutation | ` 
Is, is the identity matrix. D512 is the diagonal matrix with entries (1, w,..., w511). The 


two copies of F512 are what we expected. Don’t forget that they use the 512th root of unity 
(which is nothing but w?!!) The permutation matrix separates the incoming vector c into 
its even and odd parts c’ = (co, C2, . . . , C1022) and c” = (c1,c3,..., C1023). 

Here are the algebra formulas which say the same thing as that factorization of F024: 


(One step of the FFT) Set m = in. The first m and last m components of y = Fre 
combine the half-size transforms y’ = Fmc and y” = Fmc”. Equation (4) shows this 
step from n tom = n/2as Iy’ + Dy” and Iy’ — Dy": 


GS e o gH 1 


; (5) 
Wraae =u; Wu 7=0,....m—1. 
Split c into c’ and c”, transform them by Fm into y’ and y”, then (5) reconstructs y. 
Those formulas come from separating Co... ,Cn—1 into even Cz% and odd C2k+1 : W iS Wn. 


n—1 m-—l1 m—-1 


y = Fc Yj = ye = ». wI con + > UIRD oo with m = in. (6) 
0 0 0 
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The even c’s go into c’ = (co, c2, . . .) and the odd c’s go into c ” = (c1, c3,...). Then 
come the transforms Fmc’ and Fmc”. The key is w? = wm. This gives w2J* = wi. 


Rewrite (6) yj = Sim) a + (wn)? N m =y] + (way. (7) 


For j > m, the minus sign in (5) comes from factoring out (wn )” = —1 from (wn). 

MATLAB easily separates even c’s from odd c’s and multiplies by wł. We use conj( F) 
or equivalently MATLAB’s inverse transform ifft, because fft is based on w = W=e727*/”, 
Problem 16 shows that F and conj(F’) are linked by permuting rows. 


y' = ifft (e022 n= 2) *n/2; 


FFT step y" = ifft (e(1: 2: n= 1))*n/2; 
from n to n/2 d= w.^(0:n/2-— 1}; 
in MATLAB T l 


y = ly +d. * y"; y’ —d.x*y”]; 


The flow graph shows c’ and c” going through the half-size F>. Those steps are called 
“butterflies; from their shape. Then the outputs y’ and y” are combined (multiplying y” 
by 1,2 from D and also by —1, —i from — D) to produce y = Fe. 

This reduction from F» to two Fm s almost cuts the work in half—you see the zeros in 
the matrix factorization. That reduction is good but not great. The full idea of the FFT is 
much more powerful. It saves much more than half the time. 


00 00 
a 

10 10 

01 01 
c" 

1l 11 


The Full FFT by Recursion 


If you have read this far, you probably guessed what comes next. We reduced Fn to Fy,/2. 
Keep going to F,,/4. Every F512 leads to F256. Then 256 leads to 128. That is recursion. 
Recursion is a basic principle of many fast algorithms. Here is step 2 with four 


copies of F256 and D (256 powers of w512). Evens of evens co, c4,cg,... come first: 
ip F: pick 0,4,8,... 
F512 lt oD F pick 2,6,10; 
Fsjo) i Dp F pick  1,5,;9,: 
I -D Fj | pick 3, T-T. 
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We will count the individual multiplications, to see how much is saved. Before the FFT 
was invented, the count was the usual n? = (1024)?. This is about a million multiplica- 
tions. I am not saying that they take a long time. The cost becomes large when we have 
many, many transforms to do—which is typical. Then the saving by the FFT is also large: 


The final count for size n = 2° is reduced from n? to $n. 


The number 1024 is 21°, so £ = 10. The original count of (1024)? is reduced to 
(5)(1024). The saving is a factor of 200. A million is reduced to five thousand. That is why 
the FFT has revolutionized signal processing. 

Here is the reasoning behind sn. There are £ levels, going from n = 2° down to 
n = 1. Each level has n/2 multiplications from the diagonal D’s, to reassemble the half- 
size outputs from the lower level. This yields the final count ane, which is in log, n. 

One last note about this remarkable algorithm. There is an amazing rule for the order 
that the c’s enter the FFT, after all the even-odd permutations. Write the numbers 0 to n — 1 
in binary (like 00,01, 10,11 for n = 4). Reverse the order of those digits: 00, 10, 01, 11. 
That gives the bit-reversed order 0, 2, 1,3 with evens before odds (See Problem 17.) 
The complete picture shows the c’s in bit-reversed order, the l = log. n steps of the 
recursion, and the final output yo, ... , yn—1 which is Fn times c. 

The chapter ends with that very fundamental idea, a matrix multiplying a vector. 


Problem Set 9.3 


1 Multiply the three matrices in equation (3) and compare with F’. In which six entries 
do you need to know that 77 = —1? 

2 Invert the three factors in equation (3) to find a fast factorization of F71. 

3 F is symmetric. So transpose equation (3) to find a new Fast Fourier Transform! 

4 All entries in the factorization of Fg involve powers of we = sixth root of 1: 


n=l ||” all? ] 


Write down these matrices with 1, we, w2 in D and w3 = wé in F}. Multiply! 


5 Ifv = (1,0, 0,0) and w = (1,1,1,1), show that Fv = w and Fw = 4v. Therefore 
F-1w = v and F7!v = ‘ 


6 What is F? and what is F4 for the 4 by 4 Fourier matrix? 


7 Put the vector c = (1,0, 1,0) through the three steps of the FFT to find y = Fe. Do 
the same for c = (0, 1,0, 1). 


8 Compute y = Fc by the three FFT steps for c = (1,0,1,0,1,0,1,0). Repeat the 
computation for c = (0, 1,0,1,0,1,0,1). 
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12 


13 


14 


15 


16 
17 
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Ifw=e then w? and \/w are among the and roots of 1. 


(a) Draw all the sixth roots of 1 on the unit circle. Prove they add to zero. 


(b) What are the three cube roots of 1? Do they also add to zero? 


The columns of the Fourier matrix F are the eigenvectors of the cyclic permutation 
P (see Section 8.3). Multiply PF to find the eigenvalues A1, Az, 3, Aa: 


Oe 10o0olļlfi 1 1 1 1 1 1 1] 

0w S05) Wed. cits ae a aa ee oa 2 

0 0 0 1|]1 a 4 |7 |1 a 14 4f À3 
1 0 0 Oak 4 2 1 È i p Ma 


This is PF = FA or P = FAF™—!. The eigenvector matrix (usually X) is F. 


The equation det(P — AI) = 0 is \* = 1. This shows again that the eigenvalues 
are À = . Which permutation P has eigenvalues = cube roots of 1? 


(a) Two eigenvectors of C are (1,1, 1,1) and (1,i,i?, 7°). Find the eigenvalues e. 


Co C1 CQ C8 1 1 1 1 
C3 Co Cy C2 1 1 1 1 
=e} and Cl.) =e |. 
C2 C3 & CY 1 1 i i 
Cı C2 C3 Cg 1 1 3 i3 


(b) P = FAF! immediately gives P? = FAF! and P3 = FAF. Then 
C = coI+c1 P+e2P?+c3 P3 = F(coI+cyA+e2A?+c3A3)F71 = FEF}. 
That matrix E in parentheses is diagonal. It contains the of C. 


Find the eigenvalues of the “periodic” —1, 2, —1 matrix from E = 27 — A — A8, with 
the eigenvalues of P in A. The —1’s in the corners make this matrix periodic: 


Z Ags Sat 
—1 2 —l1 0 

C= 0 = 2 —] has co = 2, c1 = —1, &2 = 0, c3 = —1. 
-1 0-1 2 


Fast convolution = Fast multiplication by C: To multiply Č times a vector x, we 
can multiply F(E(F~'a)) instead. The direct way uses n? separate multiplications. 
Knowing E and F’, the second way uses only n log, n + n multiplications. How 
many of those come from FE, how many from F, and how many from F'~ 1? 


Notice. Why is row i of F the same as row N — i of F (numbered 0 to N — 1)? 


What is the bit-reversed order of the numbers 0,1,..., 7? Write them all in binary 
(base 2) as 000, 001, ..., 111 and reverse each order. The 8 numbers are now 
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Applications 


10.1 Graphs and Networks 


Over the years I have seen one model so often, and I found it so basic and useful, that I 
always put it first. The model consists of nodes connected by edges. This is called a graph. 
Graphs of the usual kind display functions f(x). Graphs of this node-edge kind lead 
to matrices. This section is about the incidence matrix of a graph—which tells how the n 
nodes are connected by the m edges. Normally m > n, there are more edges than nodes. 


For any m by n matrix there are two fundamental subspaces in R” and two in R”. They 
are the row spaces and nullspaces of A and AT. Their dimensions r,n — r and r,m — r 
come from the most important theorem in linear algebra. The second part of that theorem 
is the orthogonality of the row space and nullspace. Our goal is to show how examples 
from graphs illuminate this Fundamental Theorem of Linear Algebra. 


When I construct a graph and its incidence matrix, the subspace dimensions will be 
easy to discover. But we want the subspaces themselves—and orthogonality helps. It 
is essential to connect the subspaces to the graph they come from. By specializing to 
incidence matrices, the laws of linear algebra become Kirchhoff’s laws. Please don’t be 
put off by the words “current” and “voltage.” These rectangular matrices are the best. 


Every entry of an incidence matrix is 0 or 1 or —1. This continues to hold during 
elimination. All pivots and multipliers are +1. Therefore both factors in A = LU also 
contain 0,1, —1. So do the nullspace matrices! All four subspaces have basis vectors with 
these exceptionally simple components. The matrices are not concocted for a textbook, 
they come from a model that is absolutely essential in pure and applied mathematics. 


The Incidence Matrix 


Figure 10.1 displays a graph with m = 6 edges and n = 4 nodes. The 6 by 4 matrix A 
tells which nodes are connected by which edges. The first row —1,1,0,0 shows that the 
first edge goes from node 1 to node 2 (—1 for node 1 because the arrow goes out, +1 for 
node 2 with arrow in). 

Row numbers in A are edge numbers, column numbers 1, 2, 3, 4 are node numbers! 
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node 
-1 1 0 0 1 
-1 0 1 O 2 
A= 0 -1 1 0 3 edge 
-1 0 0 1 4 
0 -1 0 1 5 
0 0-1 1 6 


Figure 10.1: Complete graph with m =6 edges and n = 4 nodes: 6 by 4 incidence matrix A. 


You can write down the matrix by looking at the graph. The second graph has the 
same four nodes but only three edges. Its incidence matrix B is 3 by 4. 


o 
O® ® @ 
row 1 1—1 0 0 1 
B= —1 0j 2 edge 
a row 3 0 0 1-1 3 
© ©) 
row 2 


Figure 10.1*: Tree with 3 edges and 4 nodes and no loops. Then B has independent rows. 


The first graph is complete—every pair of nodes is connected by an edge. The second 
graph is a tree—the graph has no closed loops. Those are the two extremes. The maximum 
number of edges is in(n — 1) = 6 and the minimum to stay connected is n — 1 = 3. 

Elimination reduces every graph to a tree. Loops produce dependent rows in A and 
zero rows in the echelon forms U and R. Look at the large loop from edges 1, 2, 3 in the 
first graph, which leads to a zero row in U: 


—-1 1 0 0 —1 1 0 0 —l 1 0 0 
-1 0 1 0| — 0 -1 1 0) — 0 -l 1 0 
Oak ub. 2 0 —-l 1 0 0 0 0 0 


Those steps are typical. When edges 1 and 2 share node 1, elimination produces the “short- 
cut edge” without node 1. If the graph already has this shortcut edge making a loop, then 
elimination gives a row of zeros. When the dust clears we have a tree. 

An idea suggests itself: Rows are dependent when edges form a loop. Independent 
rows come from trees. This is the key to the row space. We are assuming that the graph 
is connected, and the arrows could go either way. On each edge, flow with the arrow is 
“positive.” Flow in the opposite direction counts as negative. The flow might be a current 
or a signal or a force—or even oil or gas or water. 
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When z1, £2, £3, £4 are voltages at the nodes, Ax gives voltage differences: 


dau La: <8 t2 — Tı 

—1 0 1 0 on 03 — Tı 

Si 0 -1 1 0 Tao T3 — T2 
Ag = —] 0 0 1 T3 a C4 = L] 0) 

O EIAS O a T Te LA — T2 

0 O-1 1 T4 — T3 


Let me say that again. The incidence matrix A is a difference matrix. The input vector 
x gives voltages, the output vector Ax gives voltage differences (along edges 1 to 6). If the 
voltages are equal, the differences are zero. This tells us the nullspace of A. 


1 The nullspace contains the solutions to Ax = 0. All six voltage differences are zero. 
This means: All four voltages are equal. Every x in the nullspace is a constant vector: 
x = (c,c,c,c). The nullspace of A is a line in R”—its dimension is n — r = 1. 

The second incidence matrix B has the same nullspace. It contains (1, 1, 1, 1): 


1—dimensional —] 1 0 0 r 0 
nullspace: same Bz=| 0 -1 1 0 "i 0 
for the tree 0 0 -1 1 1 0 


We can raise or lower all voltages by the same amount c, without changing the differ- 
ences. There is an “arbitrary constant” in the voltages. Compare this with the same state- 
ment for functions. We can raise or lower a function by C’, without changing its derivative. 


Calculus adds “+C” to indefinite integrals. Graph theory adds (c, c, c, c) to the vector x. 
Linear algebra adds any vector z,, in the nullspace to one particular solution of Ax = b. 


The “+C” disappears in calculus when a definite integral starts at a known point. 
Similarly the nullspace disappears when we fix x4 = 0. The unknown 4 is removed and 
so are the fourth columns of A and B (those columns multiplied x4). Electrical engineers 
would say that node 4 has been “grounded.” 


2 The row space contains all combinations of the six rows. Its dimension is certainly not 6. 
The equation r + (n — r) = n must be 3 + 1 = 4. The rank is r = 3, as we saw from 
elimination. After 3 edges, we start forming loops! The new rows are not independent. 

How can we tell if v = (v1, v2, v3, v4) is in the row space? The slow way is to combine 
rows. The quick way is by orthogonality: 


v is in the row space if and only if it is perpendicular to (1,1, 1,1) in the nullspace. 


The vector v = (0, 1, 2,3) fails this test—its components add to 6. The vector (—6, 1, 2, 3) 
is in the row space: -6+1+2+3 = 0. That vector equals 6(row 1)+5(row 3)+3(row 6). 
Each row of A adds to zero. This must be true for every vector in the row space. 
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3 The column space contains all combinations of the four columns. We expect three 
independent columns, since there were three independent rows. The first three columns 
of A are independent (so are any three). But the four columns add to the zero vector, which 
says again that (1,1,1,1) is in the nullspace. How can we tell if a particular vector b 
is in the column space of an incidence matrix? 


First answer Try to solve Az = b. That misses all the insight. As before, orthogonal- 
ity gives a better answer. We are now coming to Kirchhoff’s two famous laws of circuit 
theory—the voltage law and current law (KVL and KCL). Those are natural expressions 
of “laws” of linear algebra. It is especially pleasant to see the key role of the left nullspace. 


Second answer Az is the vector of voltage differences x; — xj. If we add differences 
around a closed loop in the graph, they cancel to leave zero. Around the big triangle 
formed by edges 1, 3, —2 (the arrow goes backward on edge 2) the differences cancel: 


Sum of differences is 0 (£2 — £1) + (£3 — x2) — (£3 — 41) = 0. 
Kirchhoff’s Voltage Law: The components of Ax = b add to zero around every loop. 
Around the big triangle: bı + b3 — bo = 0. 


By testing each loop, the Voltage Law decides whether b is in the column space. Ax = b 
can be solved exactly when the components of b satisfy all the same dependencies as the 
rows of A. Then elimination leads to 0 = 0, and Ax = bis consistent. 


4 The left nullspace contains the solutions to AT y = 0. Its dimension is m — r = 6 — 3: 


yı 
-1 -1 0 -1 0 Of | ye 0 
1 0 -1 0 -1 Of |% 0 
Ta = 
Current Law A-y= 0 1 1 0 0-1 wl lo (2) 
0 0 yo a a 1 Ys 0 


The true number of equations is r = 3 and not n = 4. Reason: The four equations add to 
0 = 0. The fourth equation follows automatically from the first three. 

What do the equations mean? The first equation says that —y; — y2 — y4 = 0. The 
net flow into node 1 is zero. The fourth equation says that y4 + ys + ye = 0. Flow into 
node 4 minus flow out is zero. The equations ATy = 0 are famous and fundamental: 


Kirchhoff’s Current Law: ATy = 0 Flow in equals flow out at each node. 


This law deserves first place among the equations of applied mathematics. It expresses 
“conservation” and “continuity” and “balance.” Nothing is lost, nothing is gained. When 
currents or forces are balanced, the equation to solve is Aly = 0. Notice the beautiful fact 
that the matrix in this balance equation is the transpose of the incidence matrix A. 
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What are the actual solutions to ATy = 0? The currents must balance themselves. 
The easiest way is to flow around a loop. If a unit of current goes around the big triangle 
(forward on edge 1 and 3, backward on 2), the six currents are y = (1, —1,1,0,0,0). 
This satisfies ATy = 0. Every loop current is a solution to the Current Law. Flow in 
equals flow out at every node. A smaller loop goes forward on edge 1, forward on 5, back 
on 4. Then y = (1, 0,0, —1, 1,0) is also in the left nullspace. 

We expect three independent y’s: m—r=6—3=3. The three small loops in the graph 
are independent. The big triangle seems to give a fourth y, but that flow is the sum of flows 
around the small loops. Flows around the 3 small loops are a basis for the left nullspace. 


1 0 0 1 

0 0 =i =] 

0 1 0 1 

Stree od aa a 

1 —1 0 0 

©) 0 1 —1 0 
3 small loops big loop 


The incidence matrix A comes from a connected graph with n nodes and m edges. The 
row space and column space have dimensions r = n — 1. The nullspaces of A 
and AT have dimensions 1 and m — n + 1: 


N(A) The constant vectors (c, c, .. ., c) make up the nullspace of A : dim = 1. 

C (AT) The edges of any tree give r independent rows of A: r = n — 1. 

C(A) Voltage Law: The components of Ax add to zero around all loops: dim = n — 1. 

N(A?) Current Law: A’ y = (flow in) — (flow out) = 0 is solved by loop currents. 
There are m — r = m — n + 1 independent small loops in the graph. 


For every graph in a plane, linear algebra yields Euler’s formula: Theorem 1 in topology! 
(number of nodes) — (number of edges) + (number of small loops) = 1. 
This is (n) — (m) + (m — n + 1) = 1. The graph in our example has 4 — 6 + 3 = 1. 


A single triangle has (3 nodes) — (3 edges) + (1 loop). On a 10-node tree with 9 edges 
and no loops, Euler’s count is 10 — 9 + 0. All planar graphs lead to the answer 1. 


The next figure shows a network with a current source. Kirchhoff’s Current Law changes 
from Aly = Oto Aly = f, to balance the source f from outside. Flow into each node 
still equals flow out. The six edges would have conductances c1, ..., Ce, and the current 
source goes into node 1. The source comes out from node 4 to keep the overall balance 
(in = out). The problem is: Find the currents yı, . . . , Ye on the six edges. 

Flows in networks now lead us from the incidence matrix A to the Laplacian matrix ATA. 
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Voltages and Currents and A! Ax = f 


We started with voltages x = (21,..., £n) at the nodes. So far we have Az to find voltage 
differences x; — xj along edges. And we have the Current Law ATy = 0 to find edge 
currents y = (yi,.-- Ym). If all resistances in the network are 1, Ohm’s Law will match 
y = Aw. Then Aly = AT Ax = 0. We are close but not quite there. 


Without any sources, the solution to AT Ax = 0 will just be no flow: x = O and y = O. 
I can see three ways to produce x # 0 and y + 0. 


1 Assign fixed voltages x; to one or more nodes. 
2 Add batteries (voltage sources) in one or more edges. 


3 Add current sources going into one or more nodes. See Figure 10.2 


S Xi 
©) 
7 »2 
x LN n 
2 3 3 


Figure 10.2: The currents yı to yg in a network with a source S from node 4 to node 1. 


Example Figure 10.2 includes a current source S' from node 4 to node 1. That current 
will trickle back through the network to node 4. Some current y4 will go directly on edge 
4. Other current will go the long way from node 1 to 2 to 4, or 1 to 3 to 4. By symmetry 
I expect no current (y3 = 0) from node 2 to node 3. Solving the network equations will 
confirm this. The matrix in those equations is AT A, the graph Laplacian matrix: 


Se ee ae 

S10 So Oat YO 20 eg eas 

1 0-1 0-1 O|| 0-1 1 ol (een 

0 1 1 0 0-1} |-1 o o 1| (en 

o o Od th No a Get EE TE dei 
es eee 


ATA 
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That Laplacian matrix is not invertible! We cannot solve for all four potentials because 
(1,1, 1,1) is in the nullspace of A and ATA. One node has to be grounded. Setting x4 = 0 
removes the fourth row and column, and this leaves a 3 by 3 invertible matrix. Now we 
solve AT Ax = f for the unknown potentials £1, £2, £3, with source S into node 1: 


3 —1 -l Tı S Ly 5/2 
Prk ü ~ coe to | | 0 gives zə | =") 5/4 
may Ci aA ce hae 0 T3 S/4 
yı 2 Geer E 5/4 
yo “a 8 i ooa 5/4 
Currents i oe E E e S/4) | 0 
y = —Ag yal WA 0 0 1} |S/4| ~ |} S/2 
Ys C=) 0. i) oO 5/4 
Ye 00 <1 5/4 


Half the current goes directly on edge 4. That is y4 = S/2. No current crosses from node 
2 to node 3. Symmetry indicated y3 = 0 and now the solution proves it. 


Admission of error I remembered that current flows from high voltage to low voltage. 
That produces the minus sign in y = —Az. And the correct form of Ohm’s Law will be 
Ry = —Az when the resistances on the edges are not all 1. Conductances are neater than 
resistances: C = R~! = diagonal matrix. We now present Ohm’s Law y = —C Az. 


Networks and ATC A 


In a real network, the current y along an edge is the product of two numbers. One number 
is the difference between the potentials x at the ends of the edge. This voltage difference 
is Ax and it drives the flow. The other number c is the “conductance”—which measures 
how easily flow gets through. 

In physics and engineering, c is decided by the material. For electrical currents, c 
is high for metal and low for plastics. For a superconductor, c is nearly infinite. If we 
consider elastic stretching, c might be low for metal and higher for plastics. In economics, 
c measures the capacity of an edge or its cost. 

To summarize, the graph is known from its incidence matrix A. This tells the node- 
edge connections. A network goes further, and assigns a conductance c to each edge. 
These numbers c1,..., Cm go into the “conductance matrix” C—which is diagonal. 


For a network of resistors, the conductance is c = 1/(resistance). In addition to Kirch- 
hoff’s Laws for the whole system of currents, we have Ohm’s Law for each current. 
Ohm’s Law connects the current y; on edge 1 to the voltage difference £2 — z1: 


Ohm’s Law: Current along edge = conductance times voltage difference. 


Ohm’s Law for all m currents is y = —C'Azx. The vector Aa gives the potential differ- 
ences, and C multiplies by the conductances. Combining Ohm’s Law with Kirchhoff’s 
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Current Law ATy = 0, we get ATC Ag = 0. This is almost the central equation for net- 
work flows. The only thing wrong is the zero on the right side! The network needs power 
from outside—a voltage source or a current source—to make something happen. 


Note about signs In circuit theory we change from Ax to — Ax. The flow is from higher 
potential to lower potential. There is (positive) current from node 1 to node 2 when x1 — x2 
is positive—whereas Ax was constructed to yield x2 — xı. The minus sign in physics and 
electrical engineering is a plus sign in mechanical engineering and economics. Ax versus 
— Az is a general headache but unavoidable. 


Note about applied mathematics Every new application has its own form of Ohm’s Law. 
For springs it is Hooke’s Law. The stress y is (elasticity C') times (stretching Aa). For 
heat conduction, Ax is a temperature gradient. For oil flows it is a pressure gradient. For 
least squares regression in statistics (Chapter 12) C~* is the covariance matrix. 

My textbooks Introduction to Applied Mathematics and Computational Science and 
Engineering (Wellesley-Cambridge Press) are practically built on ATC'A. This is the key 
to equilibrium in matrix equations and also in differential equations. Applied mathematics 
is more organized than it looks! In new problems I have learned to watch for ATCA. 


Problem Set 10.1 


Problems 1-7 and 8-14 are about the incidence matrices for these graphs. 


1 1 2 
edge 1 edge 2 TA 
2 3 4 


edge 3 


1 Write down the 3 by 3 incidence matrix A for the triangle graph. The first row has 
—1 in column 1 and +1 in column 2. What vectors (£1, £2, £3) are in its nullspace? 
How do you know that (1, 0,0) is not in its row space? 


2 Write down A? for the triangle graph. Find a vector y in its nullspace. The compo- 
nents of y are currents on the edges—how much current is going around the triangle? 


3 Eliminate x; and x2 from the third equation to find the echelon matrix U. What tree 
corresponds to the two nonzero rows of U? 
—X1 + T2 = bı 
—-%+%3= b2 


—gT2 + £3 = bs. 
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Choose a vector (b1, b2, b3) for which Ax = b can be solved, and another vector b 
that allows no solution. How are those b’s related to y = (1, —1, 1)? 


Choose a vector (f1, f2, f3) for which ATy = f can be solved, and a vector f 
that allows no solution. How are those f’s related to x = (1,1,1)? The equation 
ATy = f is Kirchhoff’s law. 


Multiply matrices to find ATA. Choose a vector f for which A? Ax = f can be 
solved, and solve for x. Put those potentials x and the currents y = —Aa and 
current sources f onto the triangle graph. Conductances are 1 because C = I. 


With conductances cı = 1 and cp = c3 = 2, multiply matrices to find ATCA. For 
f = (1,0,-1) find a solution to ATC Ax = f. Write the potentials x and currents 
y = —CAz on the triangle graph, when the current source f goes into node 1 and 
out from node 3. 


Write down the 5 by 4 incidence matrix A for the square graph with two loops. Find 
one solution to Ax = 0 and two solutions to AT y = O. 


Find two requirements on the b’s for the five differences v2 — £1, £3 — £1, L3 — T2, 
T4 — LQ,L4 — £3 to equal by, be, b3, b4, bs. You have found Kirchhoff’s law 
around the two in the graph. 


Reduce A to its echelon form U. The three nonzero rows give the incidence matrix 
for what graph? You found one tree in the square graph—find the other seven trees. 


Multiply matrices to find AT A and guess how its entries come from the graph: 


(a) The diagonal of ATA tells how many into each node. 


(b) The off-diagonals —1 or 0 tell which pairs of nodes are 
Why is each statement true about AT A? Answer for AT A not A. 


(a) Its nullspace contains (1, 1,1, 1). Its rank is n — 1. 
(b) It is positive semidefinite but not positive definite. 


(c) Its four eigenvalues are real and their signs are 


With conductances c1 = cg = 2 and c3 = c4 = C5 = 3, multiply the matrices 
ATCA. Find a solution to ATC Ax = f = (1,0,0,—1). Write these potentials x 
and currents y = —C'Az on the nodes and edges of the square graph. 


The matrix ATCA is not invertible. What vectors æ are in its nullspace? Why does 
ATC Aa = f have a solution if and only if fı + f2 + fs + fa = 0? 


A connected graph with 7 nodes and 7 edges has how many loops? 


For the graph with 4 nodes, 6 edges, and 3 loops, add a new node. If you connect it 
to one old node, Euler’s formula becomes ( )—( )+( )= 1. If you connect it 
to two old nodes, Euler’s formula becomes ( )—( )+( )=1. 
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17 Suppose A is a 12 by 9 incidence matrix from a connected (but unknown) graph. 


(a) How many columns of A are independent? 
(b) What condition on f makes it possible to solve ATy = f? 
(c) The diagonal entries of AT A give the number of edges into each node. What is 


the sum of those diagonal entries? 


18 Why does a complete graph with n = 6 nodes have m = 15 edges? A tree connect- 
ing 6 nodes has edges. 


Note The stoichiometric matrix in chemistry is an important “generalized” incidence 
matrix. Its entries show how much of each chemical species (each column) goes into 
each reaction (each row). 
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10.2 Matrices in Engineering 


This section will show how engineering problems produce symmetric matrices K (often 
K is positive definite). The “linear algebra reason” for symmetry and positive definiteness 
is their form K = ATA and K = ATCA. The “physical reason” is that the expression 
sulk u represents energy—and energy is never negative. The matrix C’, often diagonal, 
contains positive physical constants like conductance or stiffness or diffusivity. 

Our best examples come from mechanical and civil and aeronautical engineering. 
K is the stiffness matrix, and K~'f is the structure’s response to forces f from outside. 
Section 10.1 turned to electrical engineering—the matrices came from networks and cir- 
cuits. The exercises involve chemical engineering and I could go on! Economics and 
management and engineering design come later in this chapter (the key is optimization). 


Engineering leads to linear algebra in two ways, directly and indirectly: 

Direct way The physical problem has only a finite number of pieces. The laws 

connecting their position or velocity are linear (movement is not too big or too fast). 

The laws are expressed by matrix equations. 

Indirect way The physical system is “continuous”. Instead of individual masses, the 

mass density and the forces and the velocities are functions of x or z, y or 2, y, zZ. 

The laws are expressed by differential equations. To find accurate solutions we 

approximate by finite difference equations or finite element equations. 

Both ways produce matrix equations and linear algebra. I really believe that you cannot 

do modern engineering without matrices. 

Here we present equilibrium equations Ku = f. With motion, Md?u/dt? + Ku = f 
becomes dynamic. Then we would use eigenvalues from Ka = AM zg, or finite differences. 


Differential Equation to Matrix Equation 


Differential equations are continuous. Our basic example will be —d?u/dr? = f(z). 
Matrix equations are discrete. Our basic example will be Kou = f. By taking the step 
from second derivatives to second differences, you will see the big picture in a very short 
space. Start with fixed boundary conditions at both ends x = 0 and x = 1: 


Fixed-fixed du 
boundary value problem oa 1 with u(0) = 0 and u(1) = 0. (1) 
That differential equation is linear. A particular solution is up = — ir? (then d?u/dx? = —1). 


We can add any function “in the nullspace”. Instead of solving Ax = O for a vector æ, 
we solve —d?u/dx? = 0 for a function u, (x). (Main point: The right side is zero.) 

The nullspace solutions are un(z) = C + Dz (a 2-dimensional nullspace for a 
second order differential equation). The complete solution is Up + Un: 
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Complete du | ee 
solution to Wie = 1 u(x) = Se +C+ Dz. 


Now find C and D from the two boundary conditions: Set x = 0 and then x = 1. At 
æ = 0,u(0) = 0 forces C = 0. At x = 1, u(1) = 0 forces —$ + D = 0. Then D = L, 


1 1 1 
ulr) = E + =o a — x?) solves the fixed-fixed boundary value problem. (3) 


Differences Replace Derivatives 


To get matrices instead of derivatives, we have three basic choices—forward or backward 
or centered differences. Start with first derivatives and first differences : 


du u(a + Az) — u(x) u(x) — u(x — Az) u(x + Ax) — u(x — Az) 
— xa |n or > O IMMM. 
dx Ax AT ZAG 
Between z = 0 and x = 1, we divide the interval into n + 1 equal pieces. The pieces have 
width Az = 1/(n +1). The values of u at the n breakpoints Az, 2Az,... will be the 
unknowns u; to un in our matrix equation Ku = f: 


Solution to compute: u = (u1, U2,...,Un) © (u(Az), u(2Az),...,u(nAz)). 
Zero values uo = Un+1 = 0 come from the boundary conditions u(0) = u(1) = 0. 


d /d 
Replace the derivatives in — -— @ = 1 by forward and backward differences: 


de \ dx 
wien ay) 8 Otay fa 
Ot Sh) 9 uz |=|1 (4) 
(Ax)? ð =$ =] 
© 0 1 =i © Dl ug ii 


This is our matrix equation when n = 3 and Ax = L, The two first differences 
are transposes of each other! The equation is AT Au = (Ax)? f. When we multiply ATA, 
we get the positive definite second difference matrix Ko : 


Kou = i i < f i gives ys . / (5) 
2 a = U2 = — 8 VE 2 = oe ` 
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The wonderful fact in this example is that those numbers u1, u2, ug are exactly correct! 
They agree with the true solution u = t(x — x?) at the three meshpoints z = E, 2 3. 


Figure 10.3 shows the true solution (continuous curve) and the approximations u1, u2, u3 
(lying exactly on the curve). This curve is a parabola. 


g= Agr 2AT 3A7 4Arz = 1 


d2 
Figure 10.3: Solutions to 5 =1 and Kou = (Ax)? f with fixed-fixed boundaries. 
T£ 


How to explain this perfect answer, lying right on the graph of u(x)? In the matrix 
equation, Ko = ATA is a “second difference matrix.” It gives a centered approximation 
to —d?u/dx?. I included the minus sign because the first derivative is antisymmetric. 
The second derivative by itself is negative: 


dz 


You can see that in the matrices A and AT. The transpose of A = forward difference is 
AT = — backward difference. I don’t want to choose a centered u(x + Az) —u(x—Az). 
Centered is the best for a first difference, but then the second difference ATA would 
stretch from u(x + 2Az) to u(x — 2Az): not good. 

Now we can explain the perfect answers, exactly on the true curve u(x) = 4 (x — a”) ; 
Second differences —1, 2, —1 are exactly correct for straight lines y = x and parabolas ! 


d d d 
The “transpose” of Fe is — EY Then (-<) (=) is positive definite. 


d2 
y=r -= = 0  -(z+A7r) +22 -(z- Ar) =  0(Ar)? 
2 dy 2 2 2 2 
yYy=r a —2 —(x + Ax)? +22° —(x — Az)? = —2(Az) 
The miracle continues to y = x°. The correct —d?y/dr? = —6z is produced by 


second differences. But for y = x4 we return to earth. Second differences don’t exactly 


match —y” = —12x?. The approximations u1, u2, uz won’t fall on the graph of u(z). 
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Fixed End and Free End and Variable Coefficient c(z) 


To see two new possibilities, I will change the equation and also one boundary condition: 


f(x) with u(0) =0 and = (1) = 0. (6) 


The end z = 1 is now free. There is no support at that end. “A hanging bar is fixed 
only at the top.” There is no force at the free end x = 1. That translates to du/dx = 0 
instead of the fixed condition u = 0 at x = 1. 

The other change is in the coefficient c(x) = 1 + x. The stiffness of the bar is 
varying as you go from x = 0 to x = 1. Maybe its width is changing, or the material 
changes. This coefficient 1 + x will bring a new matrix C into the difference equation. 

Since u4 is no longer fixed at 0, it becomes a new unknown. The backward difference 
A is 4 by 4. And the multiplication by c(z) = 1 + x becomes a diagonal matrix C—which 
multiplies by 1 + Ax, . . .,1 + 4Azx at the meshpoints. Here are AT, C, and A: 


1 = HO es i o-oo 
e ee eee 1.5 F 4 6 0 
T _ 
ACA=19 o 1 -1 1.75 jt aa 
0 0 0 1 2.0 (0 44 


This matrix K = ATCA will be symmetric and positive definite! Symmetric because 
(ATCA)T = ATCTAT! = ATCA. Positive definite because it passes the energy test: 
A has independent columns, so Ax # 0 when æ Æ 0. 


Energy = x" ATC Ag = (Ax)'C(Azx) > 0 for every x 4 0, because Ax + 0. 


When you multiply the matrices AT A and ATCA for this fixed-free combination, watch 
how 1 replaces 2 in the last corner of ATA. That fourth equation has u4 — ug, a first 
(not second) difference coming from the free boundary condition du/dzx = 0. 

Notice in ATCA how c1, c2, ¢3, Ca come from c(x) = 14+ z in equation (7). Previously 
the c’s were simply 1, 1, 1, 1. Here are the fixed-free matrices: 


2 -1 Ci + C2 — C2 
—1 2 —1 —c co +C = 
T T = 2 2 3 3 
A A 4 9 4 A` CA = xin. Bele (8) 


SE —C4 C4 
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Free-free Boundary Conditions 


Suppose both ends of the bar are free. Now du/dx = 0 at both x = 0 and z = 1. 
Nothing is holding the bar in place! Physically it is unstable—it can move with no force. 
Mathematically all constant functions like u = 1 satisfy these free conditions. Algebraically 
our matrices AT A and ATC A will not be invertible: 


Free-free examples 1 -1 0 Co — Co 
Unknown uo, u1, u2 ATA= |—1 2 —1 ATCA=|-c cote =e 
Az = 0.5 0 -l 1 —Cj} Ci 
The vector (1,1,1) is in both nullspaces. This matches u(x) = 1 in the continuous 


problem. Free-free AT Au = f and ATC Au = f are generally unsolvable. 


Before explaining more physical examples, may I write down six of the matrices? The 
tridiagonal Ko appears many times in this textbook. Now we are seeing its applications. 
These matrices are all symmetric, and the first four are positive definite: 


2 -l cei =O 
Ko = AT Ao =ļ|—1 2 -1 Ad Co Ao = —C2 C2 + C3 —C3 
—] 2 — C3 C3 + C4 
Fixed-fixed Spring constants included 
2 -I oe =G 
Kı = ATA: =|-1 2 =l ATCiA, = —C9 C2 + C3 —C3 
—] 1 —C3 C3 
Fixed-free Spring constants included 
{1 =1 2 =-1 =1 
K singular — ee tea K circular = -1 2 -i 
=i 1 —1 -1 2 
Free-free Periodic u(0) = u(1) 


The matrices Ko, Kı, A singular? and Keircular have C = I for simplicity. This means 
that all the “spring constants” are c; = 1. We included At Co Ao and ATC; A; to show how 
the spring constants enter the matrix (without changing its positive definiteness). 
Our next goal is to see these same stiffness matrices in other engineering problems. 
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A Line of Springs and Masses 


Figure 10.4 shows three masses m1, M2, m3 connected by a line of springs. The fixed- 
fixed case has four springs, with top and bottom fixed. That leads to Kg and Af Co Ao. The 
fixed-free case has only three springs; the lowest mass hangs freely. That will lead to Kı 
and ATC, Aj. A free-free problem produces K singular- 

We want equations for the mass movements u and the spring tensions y: 


u = (uı,u2,u3) = movements of the masses (down is positive) 
y = (Y1,Y2,Y3, y4) or (Y1, Y2,Y3) = tensions in the springs 
fixed end uo = 0 fixed end uo = 0 
spring cı tension yı spring cı tension yı 
mass mı movement u1 mass Mı movement uı 
C2 y2 spring c2 tension y2 
m2 u2 mass m2 movement u2 
C3 y3 spring c3 tension y3 
m3 U3 mass m3 movement u3 
a y4 free end ya =Q 
fixed end u4= 0 


Figure 10.4: Lines of springs and masses: fixed-fixed and fixed-free ends. 


When a mass moves downward, its displacement is positive (u; > 0). For the springs, 
tension is positive and compression is negative (y; < 0). In tension, the spring is stretched 
so it pulls the masses inward. Each spring is controlled by its own Hooke’s Law y = ce: 
(stretching force y) = (spring constant c) times (stretching distance e). 

Our job is to link these one-spring equations y = ce into a vector equation Ku = f 
for the whole system. The force vector f comes from gravity. The gravitational constant 
g will multiply each mass to produce downward forces f = (m1g, m29, Mm3g). 

The real problem is to find the stiffness matrix (fixed-fixed and fixed-free). The best 
way to create K is in three steps, not one. Instead of connecting the movements u; directly 
to the forces fi, it is much better to connect each vector to the next in this list: 


Movements of n masses =F (Wissen gta) 
= Elongations of m springs = eranen) 
= Internal forcesinm springs = (Y1,.--,Ym) 
= External forces onn masses = (f1,.--,fn) 


A great framework for applied mathematics connects u to e to y to f. Then ATC Au = f: 


e= Au A ism byn 
Al 


Tat y=Ce C is m by m 


le] a f=Aty AT is n by m 


SC OE 
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We will write down the matrices A and C and AT for the two examples, first with fixed 
ends and then with the lower end free. Forgive the simplicity of these matrices, it is their 
form that is so important. Especially the appearance of A together with AT. 

The elongation e is the stretching distance—how far the springs are extended. Orig- 
inally there is no stretching—the system is lying on a table. When it becomes vertical 
and upright, gravity acts. The masses move down by distances u1, u2, u3. Each spring is 
stretched or compressed by e; = u; — ui—1, the difference in displacements of its ends: 


First spring: e] = Uy (the top is fixed so uo = 0) 
Stretching of Second spring: e2 = uz — ui 
each spring Third spring: €3 = U3 — U2 

Fourth spring: e4 = — u3 (the bottom is fixed so u4 = 0) 
If both ends move the same distance, that spring is not stretched: uj = u;—1 and 


ej = 0. The matrix in those four equations is a 4 by 3 difference matrix A, and e = Au: 


Stretching “i E i ui 
distances e = Au is Oe Wes 0 -1 1 uz |. (9) 
(elongations) 0 0 -1 u3 


The next equation y = C'e connects spring elongation e with spring tension y. This is 
Hooke’s Law yi = ciei for each separate spring. It is the “constitutive law” that depends 
on the material in the spring. A soft spring has small c, so a moderate force y can produce 
a large stretching e. Hooke’s linear law is neariy exact for real springs, before they are 
overstretched and the material becomes plastic. 

Since each spring has its own law, the matrix in y = C'e is a diagonal matrix C: 


Hooke’s yı = ae Yı c1 €1 
Law Yo = C22 i, | #2 | = me efi) 1) 
y = Ce y3 = C363 Y3 C3 e3 

Y4 = C4€4 Y4 C4 €4 


Combining e = Au with y = Ce, the spring forces (tension forces) are y = C'Au. 
Finally comes the balance equation, the most fundamental law of applied math- 

ematics. The internal forces from the springs balance the external forces on the masses. 

Each mass is pulled or pushed by the spring force y; above it. From below it feels the 


spring force y;,1 plus fj from gravity. Thus yj = yj4i + fj or fj = Yj — Yy+1! 


Force fi = a-y fi 1 -1 0 0 a 
balance f, = yo—yz is| fo} =]0 1-1 0|]? | ap 
f=ATy fp = Y-Y fs GO ae is 


That matrix is AT ! The equation for balance of forces is f = ATy. Nature transposes 
the rows and columns of the e — u matrix to produce the f — y matrix. This is the beauty of 
the framework, that AT appears along with A. The three equations combine into Ku = f. 
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e = Au combineinto A'CAu=f or Ku=f 
y = Ce K = ATCA is the stiffness matrix (mechanics) 
f = Ay K = ATCA is the conductance matrix (networks) 


Finite element programs spend major effort on assembling K = ATC'A from thousands 
of smaller pieces. We find K for four springs (fixed-fixed) by multiplying AT times C'A: 


Cj 0 0 
1 1 0 = m 0 Ci + C2 —c9 0 
0 1 -l 0 0 =ë 4 = =o C2 + C3 — C3 
EE ES y 0 —c3 ¢3t+c4 


0 0 —C4 


If all springs are identical, with c1 = cp = c3 = cy = 1, then C = J. The stiffness matrix 
reduces to AT A. It becomes the special —1, 2, —1 matrix Ko. 


Note the difference between ATA from engineering and LU from linear algebra. The 
matrix A from four springs is 4 by 3. The triangular matrices from elimination are square. 
The stiffness matrix K is assembled from ATA, and then broken up into LU. One step 
is applied mathematics, the other is computational mathematics. Each K is built from 
rectangular matrices and factored into square matrices. 

May I list some properties of K = ATCA? You know almost all of them: 


1. K is tridiagonal, because mass 3 is not connected to mass 1. 

2. K is symmetric, because C is symmetric and AT comes with A. 

3. K is positive definite, because c; > 0 and A has independent columns. 
4. K`! is a full matrix (not sparse) with all positive entries. 


Property 4 leads to an important fact about u = K~'f: If all forces act downwards 
(f; > 0) then all movements are downwards (u; > 0). Notice that “positive” is 
different from “positive definite”. K-t is positive (K is not). Both are positive definite. 


Example 1 Suppose all c; = c and m; = m. Find the movements u and tensions y. 
All springs are the same and all masses are the same. But all movements and elonga- 
tions and tensions will not be the same. K~! includes ; because ATC'A includes c: 


3 2 1 mg m 3/2 
Movements u-K'f=—|2 4 2 mg |= Aie 2 
“TI 28 mg DE 


The displacement u2, for the mass in the middle, is greater than u; and u3. The units are 
correct: the force mg divided by force per unit length c gives a length u. Then 


i E i 3/2 

: = pp 0h) ms _ mg 1/2 
Elongations e = Au = 0 1 1 7 : = = -1/2 
0 0 -1 2 —3/2 


470 Chapter 10. Applications 


The three matrices are mixed together by ATCA, and they cannot easily be untangled. 
In general, ATy = f has many solutions. And four equations Au = e would usually 
have no solution with three unknowns. But ATCA gives the correct solution to all three 
equations in the framework. Only when m = n and the matrices are square can we go from 
y = (At) 'f to e = Cly to u = A` te. We will see that now. 


Fixed End and Free End 


Remove the fourth spring. All matrices become 3 by 3. The pattern does not change! The 
matrix A loses its fourth row and (of course) AT loses its fourth column. The new stiffness 
matrix Kı becomes a product of square matrices: 


1 =i 0 ĉi ii 0 0 
ATCA: = 0 1 -1 C2 —1 ik 0 
0 0 1 C3 0 -l 1 


The missing column of AT and row of A multiplied the missing c4. So the quickest way to 
find the new ATCA is to set c4 = 0 in the old one: 


cy + C2 —C9 0 
FIXED 
FREE ATCA: = — C2 C2 + C3 —C3 |. (12) 
0 =C3 C3 


Example 2 If cy = co = c3 = 1 and C = J, this is the —1, 2, —1 tridiagonal matrix K1. 
The last entry of K; is 1 instead of 2 because the spring at the bottom is free. Suppose all 
my Sm: 


oe mg n 
Fixed-free u=K;'f=-|1 2 2 mg | =—2 
“Tia 3 mg 6 


Those movements are greater than the free-free case. The number 3 appears in u because 
all three masses are pulling the first spring down. The next mass moves by that 3 plus an 
additional 2 from the masses below it. The third mass drops even more (3 + 2+ 1 = 6). 
The elongations e = Au in the springs display those numbers 3, 2, 1: 


5 | = 2 


ee) Ji 
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Two Free Ends: K is Singular 


Freedom at both ends means trouble. The whole line can move. A is 2 by 3: 


FREE-FREE e |_| w—-u]_{-1 1 0 (13) 
e= Au ez | | uszu] COSU H 
3 


Now there is a nonzero solution to Au = 0. The masses can move with no stretching of 
the springs. The whole line can shift by u = (1,1,1) and this leaves e = (0,0): 


1 


em E rod s no stretching . (14) 
oiT i 


Au = 0 certainly leads to ATCAu = 0. Then ATCA is only positive semidefinite, 
without cı and c4. The pivots will be cg and c3 and no third pivot. The rank is only 2: 


=A 6 C2 Cp 0 
I =i i : || i k il- co c&2+c3 -cs (15) 
0 1 3 0 =Ņ3 C3 


Two eigenvalues will be positive but x = (1,1,1) is an eigenvector for A = 0. We can 
solve ATC Au = f only for special vectors f. The forces have to add to fı + f2 + fa = 0, 
or the whole line of springs (with both ends free) will take off like a rocket. 


Circle of Springs 


A third spring will complete the circle from mass 3 back to mass 1. This doesn’t make K 
invertible—the stiffness matrix K eircular Matrix is still singular: 


E E 1 0 —1 2 =j =] 
Acireular4circular = (07 TEE ah O eal ae, o) 
-1 0 1 0 —1 1 -1 -1 2 


The only pivots are 2 and 3, The eigenvalues are 3 and 3 and 0. The determinant is zero. 
The nullspace still contains 2 = (1,1,1), when all the masses move together. 
This movement vector (1, 1, 1) is in the nullspace of Aeircular and Keircular = ATCA. 


May I summarize this section? I hope the example will help you connect calculus with 
linear algebra, replacing differential equations by difference equations. If your step Az is 
small enough, you will have a totally satisfactory solution. 


The equation is — = (cy) = f(x) with u(0) = 0 and fo or Zin) =() 


Divide the bar into N pieces of length Az. Replace du/dx by Au and —dy/dzx by ATy. 
Now A and AT include 1/Az. The end conditions are uo = 0 and [uy = 0 or yy = Oj. 
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The three steps —d/dz and c(x) and d/dz correspond to AT and C and A: 
f= A"y and y= Ce and e = Au give A'CAu= f. 
This is a fundamental example in computational science and engineering. 


1. Model the problem by a differential equation 
2. Discretize the differential equation to a difference equation 
3. Understand and solve the difference equation (and boundary conditions!) 


4. Interpret the solution; visualize it; redesign if needed. 


Numerical simulation has become a third branch of science, beside experiment and deduc- 
tion. Computer design of the Boeing 777 was much less expensive than a wind tunnel. 
The two texts Introduction to Applied Mathematics and Computational Science and 
Engineering (Wellesley-Cambridge Press) develop this whole subject further—see the 
course page math.mit.edu/18085 with video lectures (The lectures are also on ocw.mit.edu 
and YouTube). I hope this book helps you to see the framework behind the computations. 


Problem Set 10.2 


1 Show that det Ad Co Ao = €1€9C3 +€1€3C4+-C1.C2Ca+-C2¢3C4. Find also det A} C1 Ai 
in the fixed-free example. 


2 Invert A} C; A; in the fixed-free example by multiplying A; 'C7'(AT)7!. 


3 In the free-free case when A‘ CA in equation (15) is singular, add the three equations 
ATC Au = f to show that we need f;+ fo+f3 = 0. Finda solution to ATC Au = f 
when the forces f = (—1, 0, 1) balance themselves. Find all solutions! 


4 Both end conditions for the free-free differential equation are du/dx = 0: 
d d d 
—— (aE) = f(x) with = = 0 at both ends. 


Integrate both sides to show that the force f(x) must balance itself, f f(x) dx = 0, 
or there is no solution. The complete solution is one particular solution u(x) plus 
any constant. The constant corresponds to u = (1, 1,1) in the nullspace of ATCA. 


5 In the fixed-free problem, the matrix A is square and invertible. We can solve Aly = 
f separately from Au = e. Do the same for the differential equation: 


Solve — a = f(x) with yi) = Graph y(r) if f(2)= 
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10 


11 


12 


The 3 by 3 matrix Kı = Af Cı A, in equation (6) splits into three “element matrices” 
cE, + cok. + c3E3. Write down those pieces, one for each c. Show how they 
come from column times row multiplication of Af C’, A,. This is how finite element 
stiffness matrices are actually assembled. 


For five springs and four masses with both ends fixed, what are the matrices A and 
C and K? With C = I solve Ku = ones(4). 


Compare the solution u = (u1, v2, u3, u4) in Problem 7 to the solution of the con- 
tinuous problem —u” = 1 with u(0) = 0 and u(1) = 0. The parabola u(x) should 
correspond at 7 = E, 2, 3, 2 to u—is there a (Az)? factor to account for? 

Solve the fixed-free problem —u” = mg with u(0) = 0 and u/(1) = 0. Compare 


u(x) at z = 4, 2, 3 with the vector u = (3mg, 5mg, 6mg) in Example 2. 


Suppose cı = cg = c3 = c4 = 1, mı = 2 and mo = m3 = 1. Solve A'CAu = 
(2, 1,1) for this fixed-fixed line of springs. Which mass moves the most (largest u) ? 


(MATLAB) Find the displacements u(1),...,u(100) of 100 masses connected by 
springs all with c = 1. Each force is f(i) = .01. Print graphs of u with fixed-fixed 
and fixed-free ends. Note that diag(ones(n, 1),d) is a matrix with n ones along 
diagonal d. This print command will graph a vector u: 


plot(w,’-+’);  xlabel(’ mass number’); ylabel( movement’); print 


(MATLAB) Chemical engineering has a first derivative du/dz from fluid velocity as 
well as d?u/dzx? from diffusion. Replace du/dx by a forward difference, then a 
centered difference, then a backward difference, with Ax = E, Graph your three 
numerical solutions of 


du du ; 
= +10 a 1 with u(0) = u(1) =0. 


This convection-diffusion equation appears everywhere. It transforms to the 
Black-Scholes equation for option prices in mathematical finance. 


Problem 12 is developed into the first MATLAB homework in my 18.085 course on 
Computational Science and Engineering at MIT. Videos on ocw.mit.edu. 
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10.3 Markov Matrices, Population, and Economics 


This section is about positive matrices: every a;; > 0. The key fact is quick to state: 
The largest eigenvalue is real and positive and so is its eigenvector. In economics 
and ecology and population dynamics and random walks, that fact leads a long way: 


Markov Amax =1 Population Amax >1 Consumption § Amax < 1 


Amax controls the powers of A. We will see this first for Amax = 1. 


Markov Matrices 


Multiply a positive vector uo again and again by this matrix A: 


Markov 8 3 42 
matrix aa 2 H u, = Awo uz = Au, = A’tto 
After k steps we have AFuo. The vectors u1, U2, u3,. .. will approach a “steady state” 


U = (.6,.4). This final outcome does not depend on the starting vector uo. For every 
Uo = (a, 1 — a) we converge to the same Uoo(.6,.4): The question is why. 
The steady state equation Au~ = Uœ Makes Up an eigenvector with eigenvalue 1: 


8 3| |.6 6j 
Steady state - j H = i = Woo: 


Multiplying by A does not change us. But this does not explain why so many vectors uo 
lead to Uo. Other examples might have a steady state, but it is not necessarily attractive: 


1 0 


Not Markov B= k 9 


| has the unattractive steady state B o = p 


In this case, the starting vector uo = (0,1) will give u, = (0,2) and u2 = (0,4). The 
second components are doubled. In the language of eigenvalues, B has A = 1 but also 
A = 2— this produces instability. The component of u along that unstable eigenvector is 
multiplied by A, and |A| > 1 means blowup. 

This section is about two special properties of A that guarantee a stable steady state. 
These properties define a positive Markov matrix, and A above is one particular example: 


1. Every entry of A is positive: a;; > 0. 


Marko tri 
arkov matrix 2. Every column of A adds to 1. 


Column 2 of B adds to 2, not 1. When A is a Markov matrix, two facts are immediate: 
Because of 1: Multiplying uo > 0 by A produces a nonnegative u; = Auo > 0. 
Because of 2: If the components of uo add to 1, so do the components of w; = Auo. 
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Reason: The components of wo add to 1 when [1 ‘ic | Juo = 1. This is true for each 
column of A by Property 2. Then by matrix multiplication [1 ... 1JA=[1 ... 1]: 
Components of Auo add to 1 [1 oes dA = [1 ... Iug = 1. 


The same facts apply to u = Au, and u3 = Aug. Every vector A* up is nonnegative 
with components adding to 1. These are “probability vectors.” The limit us is also a 
probability vector—but we have to prove that there is a limit. We will show that Amax = 1 
for a positive Markov matrix. 


Example 1 The fraction of rental cars in Denver starts at 35 = .02. The fraction outside 
Denver is .98. Every month, 80% of the Denver cars stay in Denver (and 20% leave). 
Also 5% of the outside cars come in (95% stay outside). This means that the fractions 
uo = (.02, .98) are multiplied by A: 


80 .05 


First month G 95 


| leadsto ui = Augo =A i = | 


.065 
.98 l 


.935 
Notice that .065 + .935 = 1. AIl cars are accounted for. Each step multiplies by A: 
Next month uz = Au, = (.09875, .90125). This is A? uo. 


All these vectors are positive because A is positive. Each vector ug will have its compo- 
nents adding to 1. The first component has grown from .02 and cars are moving toward 
Denver. What happens in the long run? 


This section involves powers of matrices. The understanding of A? was our first and 
best application of diagonalization. Where A* can be complicated, the diagonal matrix A* 
is simple. The eigenvector matrix X connects them: A* equals XA*X~1. The new ap- 
plication to Markov matrices uses the eigenvalues (in A) and the eigenvectors (in X). We 
will show that uso is an eigenvector of A corresponding to À = 1. 

Since every column of A adds to 1, nothing is lost or gained. We are moving rental cars 
or populations, and no cars or people suddenly appear (or disappear). The fractions add to 
1 and the matrix A keeps them that way. The question is how they are distributed after k 
time periods—which leads us to A*. 


Solution A’ uo gives the fractions in and out of Denver after k steps. We diagonalize A to 
understand A*. The eigenvalues are \ = 1 and .75 (the trace is 1.75). 


saw ajja and aijn]. 


The starting vector uo combines x, and x2, in this case with coefficients 1 and .18: 


Combination of eigenvectors Uo = e = [3 +.18 E f 


Now multiply by A to find u1. The eigenvectors are multiplied by A; = 1 and Ag = .75: 


Each z is multiplied by À i is + (.75)(.18) Mi : 
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Every month, another À = .75 multiplies the vector x2. The eigenvector x, is unchanged: 


After k steps up = Auo = 1* a HETS (18) ie f 


This equation reveals what happens. The eigenvector xı with A = 1 is the steady state. 
The other eigenvector x2 disappears because |A| < 1. The more steps we take, the closer 
2 


we come to Ug. = (.2,.8). In the limit, +5 of the cars are in Denver and 5 are outside. 


This is the pattern for Markov chains, even starting from up = (0, 1): 


If A is a positive Markov matrix (entries ai; > 0, each column adds to 1), then 
A; = 1 is larger than any other eigenvalue. The eigenvector xı is the steady state: 


Uk = T1 + C2(Az)*ae + +++ + Cn(Àn) £n always approaches us = £1. 


The first point is to see that A = 1 is an eigenvalue of A. Reason: Every column of 
A-— I adds to 1—1 = 0. The rows of A — I add up to the zero row. Those rows are linearly 
dependent, so A — J is singular. Its determinant is zero and \ = 1 is an eigenvalue. 

The second point is that no eigenvalue can have |A| > 1. With such an eigenvalue, 
the powers A? would grow. But A* is also a Markov matrix! A* has positive entries 
still adding to 1—and that leaves no room to get large. 


A lot of attention is paid to the possibility that another eigenvalue has |A| = 1. 
Example 2 A= [93] has no steady state because Az = —1. 


This matrix sends all cars from inside Denver to outside, and vice versa. 
The powers A* alternate between A and I. The second eigenvector 22 = (—1,1) will be 
multiplied by Aj = —1 at every step—and does not become smaller: No steady state. 


Suppose the entries of A or any power of A are all positive—zero is not allowed. 
In this “regular” or “primitive” case, A = 1 is strictly larger than any other eigenvalue. 
The powers A* approach the rank one matrix that has the steady state in every column. 


Example 3 (“Everybody moves’) Start with three groups. At each time step, half of 
group 1 goes to group 2 and the other half goes to group 3. The other groups also split in 
half and move. Take one step from the starting populations p1, p2, ps: 


0 3 pı 5P2 + ips 
New populations u, = Auo= |4 0 4| |p| = | $714 4ps 
3 : 0| LPs $D1 + $P2 


A is a Markov matrix. Nobody is born or lost. A contains zeros, which gave trouble in 
Example 2. But after two steps in this new example, the zeros disappear from A?: 


1 1 1 

2 4 4 Pı 
Two-step matrix uz = uo = |4 4 4| |p 

CAE LEL 3 

Da a 
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The eigenvalues of A are \; = 1 (because A is Markov) and Az = A3 = —4, For A = 1, 
the eigenvector x, = (4, 7 3) will be the steady state. When three equal populations 
split in half and move, the populations are again equal. Starting from uo = (8, 16, 32), 
the Markov chain approaches its steady state: 


8 24 16 20 
Uo = | 16 u = | 20 uz = | 18 u3 = |19 
32 12 22 17 


The step to u4 will split some people in half. This cannot be helped. The total population 
is 8 + 16 + 32 = 56 at every step. The steady state is 56 times (4, 3: i). You can see the 
three populations approaching, but never reaching, their final limits 56/3. 

Challenge Problem 6.7.16 created a Markov matrix A from the number of links be- 
tween websites. The steady state u will give the Google rankings. Google finds us by a 
random walk that follows links (random surfing). That eigenvector comes from counting 
the fraction of visits to each website—a quick way to compute the steady state. 


The size |A2| of the second eigenvalue controls the speed of convergence to steady state. 


Perron-Frobenius Theorem 


One matrix theorem dominates this subject. The Perron-Frobenius Theorem applies when 
all a;; > 0. There is no requirement that columns add to 1. We prove the neatest form, 
when all a;; > 0: any positive matrix A (not necessarily positive definite!). 


Perron-Frobenius for A > 0 All numbers in Ax = Xmax are strictly positive. 


Proof The key idea is to look at all numbers t such that Ax > ta for some nonnegative 
vector x (other than x = 0). We are allowing inequality in Ax > tæ in order to have many 
small positive candidates t. For the largest value tmax (which is attained), we will show 
that equality holds: Ax = tmax z. 

Otherwise, if Ax > tmaxaz is not an equality, multiply by A. Because A is positive 
that produces a strict inequality A?” > tmax Az. Therefore the positive vector y = Ax 
satisfies Ay > tmaxy, and tmax could be increased. This contradiction forces the equality 
Ax = tmax2, and we have an eigenvalue. Its eigenvector xæ is positive because on the left 
side of that equality, Ax is sure to be positive. 

To see that no eigenvalue can be larger than tmax, suppose Az = Az. Since À and z 
may involve negative or complex numbers, we take absolute values: |A||z| = |Az| < Alz| 
by the “triangle inequality.” This |z| is a nonnegative vector, so this |A| is one of the 
possible candidates t. Therefore |A| cannot exceed tmax—which must be Amax. 
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Population Growth 


Divide the population into three age groups: age < 20, age 20 to 39, and age 40 to 59. 
At year T the sizes of those groups are n1,72,n3. Twenty years later, the sizes have 
changed for three reasons: births, deaths, and getting older. 


1. Reproduction n™€W = F; ni + Fona + Fs ng gives anew generation 
2. Survival nBEW = Pin, and nBeW = Pynz gives the older generations 
The fertility rates are F,, F2, F3 (Fo largest). The Leslie matrix A might look like this: 


new 


nı Fy Fo F3 Ny 04 1.1 O01 Ny 
ne = Pi 0 0 ng = .98 0 0 nz 
n3 0 R 0 n3 0 92 0 n3 


This is population projection in its simplest form, the same matrix A at every step. In 
a realistic model, A will change with time (from the environment or internal factors). 
Professors may want to include a fourth group, age > 60, but we don’t allow it. 


The matrix has A > 0 but not A > 0. The Perron-Frobenius theorem still applies 
because A? > 0. The largest eigenvalue is Amax œ 1.06. You can watch the generations 
move, starting from no = 1 in the middle generation: 


1.06 1.08 0.05 .00 0.10 119 01 
eig(A)= —1.01 A°= | 0.04 1.08 .01 A? = | 0.06 0.05 .00 
—0.01 0.90 0 0 0.04 0.99 .01 


A fast start would come from wp = (0,1,0). That middle group will reproduce 1.1 and 
also survive .92. The newest and oldest generations are in u; = (1.1,0,.92) = column 2 of 
A. Then ug = Au, = A?up is the second column of A”. The early numbers (transients) 
depend a lot on uo, but the asymptotic growth rate Amax is the same from every start. 
Its eigenvector x = (.63, .58, .51) shows all three groups growing steadily together. 


Caswell’s book on Matrix Population Models emphasizes sensitivity analysis. The 
model is never exactly right. If the F’s or P’s in the matrix change by 10%, does Amax 
go below 1 (which means extinction)? Problem 19 will show that a matrix change AA 
produces an eigenvalue change AA = yT(AA)gæ. Here x and y7 are the right and left 
eigenvectors of A, with Ax = dx and Aly = Ay. 


Linear Algebra in Economics: The Consumption Matrix 


A long essay about linear algebra in economics would be out of place here. A short note 
about one matrix seems reasonable. The consumption matrix tells how much of each input 
goes into a unit of output. This describes the manufacturing side of the economy. 
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Consumption matrix We have n industries like chemicals, food, and oil. To produce a 
unit of chemicals may require .2 units of chemicals, .3 units of food, and .4 units of oil. 
Those numbers go into row 1 of the consumption matrix A: 


chemical output .2 .3 .4] | chemical input 
food output = [ize a al food input 
oil output nor al ga oil input 


Row 2 shows the inputs to produce food—a heavy use of chemicals and food, not so much 
oil. Row 3 of A shows the inputs consumed to refine a unit of oil. The real consumption 
matrix for the United States in 1958 contained 83 industries. The models in the 1990’s 
are much larger and more precise. We chose a consumption matrix that has a convenient 
eigenvector. 

Now comes the question: Can this economy meet demands y1, y2, y3 for chemicals, 
food, and oil? To do that, the inputs pı, p2, p3 will have to be higher—because part of p 
is consumed in producing y. The input is p and the consumption is Ap, which leaves the 
output p — Ap. This net production is what meets the demand y: 


Problem Find a vector p such that p—Ap=y or p=(I-—A)~+y. 


Apparently the linear algebra question is whether J — A is invertible. But there is 
more to the problem. The vector y of required outputs is nonnegative, and so is A. The 
production levels in p = (I — A)~'y must also be nonnegative. The real question is: 


When is (I — A)—' a nonnegative matrix? 


This is the test on (I — A)~! for a productive economy, which can meet any demand. 
If A is small compared to J, then Ap is small compared to p. There is plenty of output. 
If A is too large, then production consumes too much and the demand y cannot be met. 


“Small” or “large” is decided by the largest eigenvalue \, of A (which is positive): 


IfA; >1 then (J —A)~"* has negative entries 
Ifà =1 then (J — A)! fails to exist 
Ifà <1 then (J —A)7~! is nonnegative as desired. 


The main point is that last one. The reasoning uses a nice formula for (J — A)~', which 
we give now. The most important infinite series in mathematics is the geometric series 
1+2+ 27 +---. This series adds up to 1/(1 — x) provided x lies between —1 and 1. 
When x = 1 the series is 1 + 1+1-+--- = co. When |z| > 1 the terms x” don’t go to 
zero and the series has no chance to converge. 

The nice formula for (I — A)~' is the geometric series of matrices: 


Geometric series (T-HT =14A4 A 4A +... 
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If you multiply the series S = I + A + A? +.--- by A, you get the same series except 
for I. Therefore S — AS = I, whichis (I — A)S = I. The series adds to S = (I — A)~} 
if it converges. And it converges if all eigenvalues of A have |A| < 1. 

In our case A > 0. All terms of the series are nonnegative. Its sum is (J — Al? ZA: 


23 A A125, 97 
Example4 A=|.4 .4 .1| has \max =.9and(I— A)! = $ | 33 36 24|. 
5. bog 34 23 36 


This economy is productive. A is small compared to J, because Amax is .9. To meet the 
demand y, start from p = (I — A)~!y. Then Ap is consumed in production, leaving 
p — Ap. This is (I — A)p = y, and the demand is met. 


1 4 
g | has Amax = 2 and (T = A)-* = 3] i 


Example 5 A=|t 0 11l 


This consumption matrix A is too large. Demands can’t be met, because production con- 
sumes more than it yields. The series J + A + A? +... does not converge to (I — A)7} 


because Amax > 1. The series is growing while (J — Aj is actually negative. 
In the same way 1+ 2+ 4+--- is not really 1/(1 — 2) = —1. But not entirely false ! 


Problem Set 10.3 


Questions 1-12 are about Markov matrices and their eigenvalues and powers. 


1 Find the eigenvalues of this Markov matrix (their sum is the trace): 
90 .15 
a Re A i 


What is the steady state eigenvector for the eigenvalue à; = 1? 


2 Diagonalize the Markov matrix in Problem 1 to A = X AX~! by finding its other 


eigenvector: ro | | l m | | 


What is the limit of A = XAF X~! when A% = f e approaches [4 89]? 
3 What are the eigenvalues and steady state eigenvectors for these Markov matrices? 
1 iL 
2 4 4 
L. we a «i 
fated [EE 
0 8 8 0 7 4. 4 
4 4 2 
4 For every 4 by 4 Markov matrix, what eigenvector of AT corresponds to the (known) 


eigenvalue A = 1? 
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10 


11 


12 


Every year 2% of young people become old and 3% of old people become dead. 
(No births.) Find the steady state for 


young 98 .00 0| | young 
old = |.02 .97 0 old 
dead a 00 .03 1 dead |, 


For a Markov matrix, the sum of the components of æ equals the sum of the compo- 
nents of Ax. If Ax = Ax with À Æ 1, prove that the components of this non-steady 
eigenvector x add to zero. 


Find the eigenvalues and eigenvectors of A. Explain why A* approaches A: 


a=[p a] ae (4 a). 


Challenge problem: Which Markov matrices produce that steady state (.6, .4)? 

The steady state eigenvector of a permutation matrix is (4, i, E, +). This is not 
approached when uo = (0,0,0,1). What are u; and uz and u3 and u4? What are 
the four eigenvalues of P, which solve M = 1? 


0 1100 
. F ; 0 0 1 0 
Permutation matrix = Markov matrix P= 0001 
1 0 0 0 
Prove that the square of a Markov matrix is also a Markov matrix. 
If A= ee A is a Markov matrix, its eigenvalues are 1 and . The steady state 


eigenvector is Z| = 


Complete A to a Markov matrix and find the steady state eigenvector. When A is a 
symmetric Markov matrix, why is zı = (1,..., 1) its steady state? 


A Markov differential equation is not du/dt = Au but du/dt = (A — Iju. The 
diagonal is negative, the rest of A — J is positive. The columns add to zero, not 1. 


.2 3 


Find A; and Az for B = A — I = F 9 = 


| . Why does A — I have A, = 0? 


When eò and e*2¢ multiply x; and a2, what is the steady state as t — 00? 
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Questions 13-15 are about linear algebra in economics. 


13 Each row of the consumption matrix in Example 4 adds to .9. Why does that make 
A = .9 an eigenvalue, and what is the eigenvector? 


14 Multiply I + A+ A? + A? +--- by I — Ato get I. The series adds to (I — A)7?. 
For A = b 2 | , find A? and A® and use the pattern to add up the series. 


15 For which of these matrices does J + A + A? +--+ yield a nonnegative matrix 
(I — A)~*? Then the economy can meet any demand: 
0 1 _|0 4 eee 1 
A=|) i ial 3| A=|5 ale 


If the demands are y = (2,6), what are the vectors p = (I — A)~+y? 


16 (Markov again) This matrix has zero determinant. What are its eigenvalues? 


3 


OR N 
to 


A 
Find the limits of A* wo starting from wo = (1, 0,0) and then uo = (100, 0,0). i 
17 ‘If Ais a Markov matrix, why doesn’t I + A+ A? +--+ add up to (J — A)7!? l 


18 For the Leslie matrix show that det(A — AT) = 0 gives F4 A? + Foa PL À + Fz P) Pz = 
à’. The right side A° is larger as À —— 00. The left side is larger at \ = 1 if 
F, + FoP, + FPP > 1. In that case the two sides are equal at an eigenvalue 
Amax > 1: growth. : 


19 Sensitivity of eigenvalues: A matrix change AA produces eigenvalue changes AA. | 
Those changes A\ı,..., An are on the diagonal of (X~1AAX). Challenge: 


Start from AX = X A. The eigenvectors and eigenvalues change by AX and AA: 
(AHAA) (X+AX) =(X+AX)(A+AA) becomes A(AX)HAA)X = X(AA)HAX)A. 


Small terms (AA)(AX) and (AX )(AA) are ignored. Multiply the last equation by 
X ~~}. From the inner terms, the diagonal part of X~!(A.A)X gives AA as we want. 
Why do the outer terms XT! A AX and X71! AX A cancel on the diagonal? i 


Explain X~'A = AX andthen  diag(A X~* AX) = diag(X~' AX A). 


20 Suppose B > A > 0, meaning that each b;; > ai; > 0. How does the Perron- 
Frobenius discussion show that \max(B) > Amax(A) ? 
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10.4 Linear Programming 


Linear programming is linear algebra plus two new ideas: inequalities and minimization. 
The starting point is still a matrix equation Ax = b. But the only acceptable solutions 
are nonnegative. We require x > 0 (meaning that no component of x can be negative). 
The matrix has n > m, more unknowns than equations. If there are any solutions x > 0 
to Ax = b, there are probably a lot. Linear programming picks the solution x* > 0 
that minimizes the cost: 


The cost is cyx1 + +++ + Cnn. The winning vector x* is 
the nonnegative solution of Ax = b that has smallest cost. 


Thus a linear programming problem starts with a matrix A and two vectors 6 and c: 
i) Ahas n > m: for example A = [1 1 2] (one equation, three unknowns) 
ii) b has m components for m equations Ax = b: for example b = [4] 
iii) The cost vector c has n components: for example c = [5 3 8]. 


Then the problem is to minimize c - x subject to the requirements Ax = b and x > 0: 
Minimize 5x1 + 342+ 8x3 subjectto xı + £2 + 2£3 = 4 and 21,2%2,2%3 > 0. 


We jumped right into the problem, without explaining where it comes from. Linear pro- 
gramming is actually the most important application of mathematics to management. De- 
velopment of the fastest algorithm and fastest code is highly competitive. You will see that 
finding z* is harder than solving Aa = b, because of the extra requirements: x* > 0 and 
minimum cost c!a*. We will explain the background, and the famous simplex method, 
and interior point methods, after solving the example. 

Look first at the “constraints”: Ax = band x > 0. The equation x; + x2 + 273 = 4 
gives a plane in three dimensions. The nonnegativity xı > 0, x2 > 0,x3 > 0 chops the 
plane down to a triangle. The solution x* must lie in the triangle PQR in Figure 8.6. 

Inside that triangle, all components of a are positive. On the edges of PQR, 
one component is zero. At the corners P and Q and R, two components are zero. The 
optimal solution x* will be one of those corners! We will now show why. 

The triangle contains all vectors x that satisfy Ax = b and x > 0. Those x’s are called 
feasible points, and the triangle is the feasible set. These points are the allowed candidates 
in the minimization of c - æ, which is the final step: 


Find x* in the triangle PQ R to minimize the cost 521 + 3x2 + 8273. 


The vectors that have zero cost lie on the plane 52; + 3%2 + 8%3 = 0. That plane does 
not meet the triangle. We cannot achieve zero cost, while meeting the requirements on æ. 
So increase the cost C until the plane 52; + 322 + 823 = C does meet the triangle. 
As C increases, we have parallel planes moving toward the triangle. 
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R = (0,0, 2) 
(2 hours by computer) 


Example with four homework problems 
Az = bis the plane x; + x2 + 243 = 4 
Triangle has zı > 0, z2 = 0, z3 2 0 


corners have 2 zero components 
cost cla = 5a, + 3£2 + 823 


P = (4,0, 0) (4 hours by Ph.D.) 


Figure 10.5: The triangle contains all nonnegative solutions: Ax = b and x > 0. The 
lowest cost solution x* is a corner P, Q, or R of this feasible set. 


The first plane 5%, + 3x2 + 8x3 = C to touch the triangle has minimum cost C. 
The point where it touches is the solution x*. This touching point must be one of the 
corners P or Q or R. A moving plane could not reach the inside of the triangle before it 
touches a corner! So check the cost 52; + 3x2 + 8x3 at each corner: 


P = (4,0,0) costs 20 Q = (0,4,0) costs12 R = (0,0,2) costs 16. 


The winner is Q. Then x* = (0, 4,0) solves the linear programming problem. 

If the cost vector c is changed, the parallel planes are tilted. For small changes, Q is still 
the winner. For the cost c- x = 52, + 4x2 + 723, the optimum z* moves to R = (0,0, 2). 
The minimum cost is now 7-2 = 14. 


Note 1 Some linear programs maximize profit instead of minimizing cost. The mathemat- 
ics is almost the same. The parallel planes start with a large value of C, instead of a small 
value. They move toward the origin (instead of away), as C gets smaller. The first touching 
point is still a corner. 


Note 2 The requirements Ax = b and x > 0 could be impossible to satisfy. The equation 
£1 + £2 + x23 = —1 cannot be solved with x > 0. That feasible set is empty. 


Note 3 It could also happen that the feasible set is unbounded. If the requirement is 
Tı + T2 — 243 = 4, the large positive vector (100, 100,98) is now a candidate. So is 
the larger vector (1000, 1000, 998). The plane Ax = b is no longer chopped off to a 
triangle. The two corners P and Q are still candidates for x*, but R moved to infinity. 


Note 4 With an unbounded feasible set, the minimum cost could be —oo (minus infinity). 
Suppose the cost is —z, — x2 + x3. Then the vector (100, 100,98) costs C = —102. 
The vector (1000, 1000, 998) costs C = —1002. We are being paid to include x, and zo, 
instead of paying a cost. In realistic applications this will not happen. But it is theoretically 
possible that A, b, and c can produce unexpected triangles and costs. 


Q = (0,4, 0) (4 hours by student) 
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The Primal and Dual Problems 


This first problem will fit A, b, c in that example. The unknowns z1, £2, £3 represent hours 
of work by a Ph.D. and a student and a machine. The costs per hour are $5, $3, and $8. 
(I apologize for such low pay.) The number of hours cannot be negative: zı > 0, £2 > 
0,x3 > 0. The Ph.D. and the student get through one homework problem per hour. The 
machine solves two problems in one hour. In principle they can share out the homework, 
which has four problems to be solved: zı + £2 + 2%3 = 4. 


The problem is to finish the four problems at minimum cost c' x. 


If all three are working, the job takes one hour: zı = z2 = x3 = 1. The cost is 
5+3 +8 = 16. But certainly the Ph.D. should be put out of work by the student (who 
is just as fast and costs less—this problem is getting realistic). When the student works 
two hours and the machine works one, the cost is 6 + 8 and all four problems get solved. 
We are on the edge QR of the triangle because the Ph.D. is not working: xı = 0. 
But the best point is all work by student (at Q) or all work by machine (at R). In 
this example the student solves four problems in four hours for $12—the minimum cost. 


With only one equation in Ax = b, the corner (0,4,0) has only one nonzero 
component. When Ax = b has m equations, corners have m nonzeros. We solve 
Aa = b for those m variables, with n — m free variables set to zero. But unlike Chapter 3, 
we don’t know which m variables to choose. 

The number of possible corners is the number of ways to choose m components out 
of n. This number “n choose m” is heavily involved in gambling and probability. With 
n = 20 unknowns and m = 8 equations (still small numbers), the “feasible set” can have 
20!/8!12! corners. That number is (20)(19) --- (13) = 5,079,110,400. 

Checking three corners for the minimum cost was fine. Checking five billion corners is 
not the way to go. The simplex method described below is much faster. 


The Dual Problem In linear programming, problems come in pairs. There is a minimum 
problem and a maximum problem—the original and its “dual.” The original problem was 
specified by a matrix A and two vectors b and c. The dual problem transposes A and 
switches b and c: Maximize b - y. Here is the dual to our example: 


A cheater offers to solve homework problems by selling the answers. 
The charge is y dollars per problem, or 4y altogether. (Note how b = 4 has 
gone into the cost.) The cheater must be as cheap as the Ph.D. or student or 
machine: y < 5 and y < 3 and 2y < 8. (Note how c = (5, 3, 8) has gone into 
inequality constraints). The cheater maximizes the income 4y. 


Dual Problem Maximize b- y subject to ATy < c . 


The maximum occurs when y = 3. The income is 4y = 12. The maximum in the dual 
problem ($12) equals the minimum in the original ($12). Max = min is duality. 
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If either problem has a best vector (x* or y*) then so does the other. 


Minimum cost c-x* equals maximum income b- y* 


This book started with a row picture and a column picture. The first “duality theorem” was 
about rank: The number of independent rows equals the number of independent columns. 
That theorem, like this one, was easy for small matrices. Minimum cost = maximum 
income is proved in our text Linear Algebra and Its Applications. One line will establish 
the easy half of the theorem: The cheater’s income bly cannot exceed the honest cost: 


If Ax =b,x >0,ATy<c then b'y = (Ax)Ty = £T(ATy) <a7Te. (1) 


The full duality theorem says that when bly reaches its maximum and 2‘ c reaches its 
minimum, they are equal: b- y* = c- x*. Look at the last step in (1), with < sign: 


The dot product of x > 0 and s = c— Aly > 0 gave a's > 0. This is x? ATy < ac. 


Equality needs x's = 0 So the optimal solution has x; =Oors; =0 foreach j. 


The Simplex Method 


Elimination is the workhorse for linear equations. The simplex method is the workhorse for 
linear inequalities. We cannot give the simplex method as much space as elimination, but 
the idea can be clear. The simplex method goes from one corner to a neighboring corner of 
lower cost. Eventually (and quite soon in practice) it reaches the corner of minimum cost. 

A corner is a vector x > Q that satisfies the m equations Ax = b with at most m 
positive components. The other n — m components are zero. (Those are the free variables. 
Back substitution gives the m basic variables. All variables must be nonnegative or x is 
a false corner.) For a neighboring corner, one zero component of x becomes positive and 
one positive component becomes zero. 


The simplex method must decide which component “enters” by becoming positive, 
and which component “leaves” by becoming zero. That exchange is chosen so as to 
lower the total cost. This is one step of the simplex method, moving toward x*. 


Here is the overall plan. Look at each zero component at the current corner. If it 
changes from 0 to 1, the other nonzeros have to adjust to keep Ax = b. Find the new x 
by back substitution and compute the change in the total cost c- x. This change is the 
“reduced cost” r of the new component. The entering variable is the one that gives the 
most negative r. This is the greatest cost reduction for a single unit of a new variable. 


Example 1 Suppose the current corner is P = (4,0,0), with the Ph.D. doing all the 
work (the cost is $20). If the student works one hour, the cost of x = (3, 1,0) is down to 
$18. The reduced cost is r = —2. If the machine works one hour, then x = (2,0, 1) also 
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costs $18. The reduced cost is also r = —2. In this case the simplex method can choose 
either the student or the machine as the entering variable. 

Even in this small example, the first step may not go immediately to the best z*. 
The method chooses the entering variable before it knows how much of that variable 
to include. We computed r when the entering variable changes from 0 to 1, but one unit 
may be too much or too little. The method now chooses the leaving variable (the Ph.D.). 
It moves to corner Q or R in the figure. 

The more of the entering variable we include, the lower the cost. This has to stop 
when one of the positive components (which are adjusting to keep Ax = b) hits zero. The 
leaving variable is the first positive x; to reach zero. When that happens, a neighboring 
corner has been found. Then start again (from the new corner) to find the next variables to 
enter and leave. 

When all reduced costs are positive, the current corner is the optimal x”. 
No zero component can become positive without increasing c - x. No new variable should 
enter. The problem is solved (and we can show that y* is found too). 


Note Generally x* is reached in an steps, where a is not large. But examples have been 
invented which use an exponential number of simplex steps. Eventually a different ap- 
proach was developed, which is guaranteed to reach x* in fewer (but more difficult) steps. 
The new methods travel through the interior of the feasible set. 


Example 2 Minimize the cost c- x = 32; + 2 + 9x3 + x4. The constraints are x > 0 
and two equations Ax = b: 


%,+2%3+%4, = 4 m = 2 equations 


£2 + T3 — T4 = 2 n=4 unknowns. 


A starting corner is x = (4, 2,0,0) which costs c- x = 14. It has m = 2 nonzeros and 
n — m = 2 zeros. The zeros are x3 and x4. The question is whether x3 or x4 should enter 
(become nonzero). Try one unit of each of them: 


If r3 = 1 andra = 0. then » =(2.1,1,0) costs 16. 
Mag landos 0), hen Oost 


Compare those costs with 14. The reduced cost of x3 is r = 2, positive and useless. The 
reduced cost of x4 is r = —1, negative and helpful. The entering variable is x4. 

How much of x4 can enter? One unit of x, made x, drop from 4 to 3. Four units will 
make x, drop from 4 to zero (while x2 increases all the way to 6). The leaving variable is 
xı. The new corner is æ = (0,6,0,4), which costs only c- æ = 10. This is the optimal 
x*, but to know that we have to try another simplex step from (0, 6, 0,4). Suppose zı or 
3 tries to enter: 


Start from the lay = landr;=0,. thena— (1,5,0,3) costs 11, 
corner (0, 6, 0, 4) If x3 = l and zı =0, then a= (0,3,1,2) costs 14. 
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Those costs are higher than 10. Both r’s are positive—it does not pay to move. The current 
comer (0,6, 0,4) is the solution x*. 

These calculations can be streamlined. Each simplex step solves three linear systems 
with the same matrix B. (This is the m by m matrix that keeps the m basic columns of A.) 
When a column enters and an old column leaves, there is a quick way to update B71. That 
is how most codes organize the simplex method. 

Our text on Computational Science and Engineering includes a short code with com- 
ments. (The code is also on math.mit.edu/cse) The best y* solves m equations AT y* = c 
in the m components that are nonzero in æ*. Then we have optimality z's = 0 and this is 
duality: Either x; = 0 or the “slack” in s* = c — ATy* has s; =0. 

When a* = (0, 4,0) was the optimal corner Q, the cheater’s price was set by y* = 3. 


Interior Point Methods 


The simplex method moves along the edges of the feasible set, eventually reaching the 
optimal corner x*. Interior point methods move inside the feasible set (where x > 0). 
These methods hope to go more directly to x*. They work well. 

One way to stay inside is to put a barrier at the boundary. Add extra cost as a 
logarithm that blows up when any variable x; touches zero. The best vector has x > 0. 
The number 0 is a small parameter that we move toward zero. 


Barrier problem Minimize c™æ — 6 (logzı +---+logz,) with Ax =b (2) 


This cost is nonlinear (but linear programming is already nonlinear from inequalities). 
The constraints x; > 0 are not needed because log x; becomes infinite at x; = 0. 

The barrier gives an approximate problem for each 0. The m constraints Ax = b have 
Lagrange multipliers y1, ..., Ym. This is the good way to deal with constraints. 


y from Lagrange L(x, y,0) = cla — 0 (© loga;) — yT (Aa — b) (3) 


OL/Oy = 0 brings back Ax = b. The derivatives OL /Ox; are interesting ! 


Optimality in OL | ae aes bai — 
barrier pbm Aa; T T; (A°y); =0 whichis æsj; =0. (4) 


The true problem has z;s; = 0. The barrier problem has zjs; = 6. The solutions æ* (0) 
lie on the central path to x* (0). Those n optimality equations z;s; = 0 are nonlinear, and 
we solve them iteratively by Newton’s method. 

The current x, y, s will satisfy Ax = b,x > 0 and Aly + s = c, but not ips, — 0. 
Newton’s method takes a step Ag, Ay, As. By ignoring the second-order term AxvAs 
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in (x + Axw)(s + As) = 90, the corrections in x, y, s come from linear equations: 


AAz=0 
Newton step AtAy+As=0 (5) 
oa, + G3; = @- LjsSj 


Newton iteration has quadratic convergence for each 0, and then 0 approaches zero. 
The duality gap x's generally goes below 1078 after 20 to 60 steps. The explanation 
in my Computational Science and Engineering textbook takes one Newton step in detail, 
for the example with four homework problems. I didn’t intend that the student should end 
up doing all the work, but x* turned out that way. 

This interior point method is used almost “as is” in commercial software, for a large 
class of linear and nonlinear optimization problems. 


Problem Set 10.4 


1 Draw the region in the xy plane where x+2y = 6 and z > Oand y > 0. Which point 
in this “feasible set” minimizes the cost c = x + 3y? Which point gives maximum 
cost? Those points are at corners. 


2 Draw the region in the xy plane where x + 2y < 6, 2r+y < 6, 7 > 0, y 2 0. It 
has four corners. Which corner minimizes the cost c = 2g — y? 


3 What are the corners of the set x; + 2x2 — x3 = 4 with z1, 72, %3 all > 0? Show 
that the cost xı + 2x3 can be very negative in this feasible set. This is an example of 
unbounded cost: no minimum. 


4 Start at x = (0,0,2) where the machine solves all four problems for $16. Move 
to x = (0,1, ) to find the reduced cost r (the savings per hour) for work by the 
student. Find r for the Ph.D. by moving to x = (1,0, ) with 1 hour of Ph.D. work. 


5 Start Example 1 from the Ph.D. corner (4,0,0) with c changed to [5 3 7]. Show 
that r is better for the machine even when the total cost is lower for the student. The 
simplex method takes two steps, first to the machine and then to the student for x*. 


6 Choose a different cost vector c so the Ph.D. gets the job. Rewrite the dual problem 
(maximum income to the cheater). 


7 A six-problem homework on which the Ph.D. is fastest gives a second constraint 
221 + z2 + 23 = 6. Then x = (2,2,0) shows two hours of work by Ph.D. and 
student on each homework. Does this x minimize the cost cx with c = (5, 3,8)? 


8 These two problems are also dual. Prove weak duality, that always y'b < c'a: 


Primal problem Minimize c! x with Ax > band æ > 0. 
Dual problem Maximize y!b with Aly < candy > 0. 
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10.5 Fourier Series: Linear Algebra for Functions 


This section goes from finite dimensions to infinite dimensions. I want to explain linear 
algebra in infinite-dimensional space, and to show that it still works. First step: look back. 
This book began with vectors and dot products and linear combinations. We begin by 
converting those basic ideas to the infinite case—then the rest will follow. 

What does it mean for a vector to have infinitely many components? There are two 
different answers, both good: 


1. The vector is infinitely long: v = (v1, v2, v3,. . .). It could be (1, E, H. K): 


2. The vector is a function f(x). It could be v = sin z. 


We will go both ways. Then the idea of a Fourier series will connect them. 
After vectors come dot products. The natural dot product of two infinite vectors 
(v1, U2,. . .) and (w1, w2,. . .) is an infinite series: 


Dot product v- w= vw + vws +0 (1) 


This brings a new question, which never occurred to us for vectors in R”. Does this infinite 
sum add up to a finite number? Does the series converge? Here is the first and biggest 
difference between finite and infinite. 

When v = w = (1,1,1,.. .), the sum certainly does not converge. In that case 
v.w =1+1+414--- is infinite. Since v equals w, we are really computing v-v = ||v||?, 
the length squared. The vector (1,1,1,.. .) has infinite length. We don’t want that vector. 
Since we are making the rules, we don’t have to include it. The only vectors to be allowed 
are those with finite length: 


DEFINITION The vector v = (v1,v2,...) and the function f(x) are in our infinite- 
dimensional “Hilbert spaces” if and only if their lengths ||v|| and || f || are finite: 


lol]? =g u = v? a ue I2 v2 ++... mustadd to a finite number. 


I = fy flea Ifl da must be a finite integral. 


Example 1 The vector v = (1,5, 4,-- 


is 2/ v3. We have a geometric series that adds to 4 /3. The length of v is the square root: 
1 


Length squared v:v=1+į+ġ+t = r= . 
= 


.) is included in Hilbert space, because its length 


Question If v and w have finite length, how large can their dot product be? 


Answer The sum v w = viw + vgW2+--- also adds to a finite number. We can safely 
take dot products. The Schwarz inequality is still true: 


Schwarz inequality |v - w| < ||u|| jw]: (2) 


The ratio of v - w to ||v|| ||w]| is still the cosine of 8 (the angle between v and w). Even 
in infinite-dimensional space, |cos 6| is not greater than 1. 
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Now change over to functions. Those are the “vectors.” The space of functions f(z), 
g(x), h(x), . . . defined for 0 < x < 27 must be somehow bigger than R”. What is the dot 
product of f(x) and g(x)? What is the length of f (ax)? 

Key point in the continuous case: Sums are replaced by integrals. Instead of a sum 
of vj times wj, the dot product is an integral of f(x) times g(x). Change the “dot” to 
parentheses with a comma, and change the words “dot product” to inner product: 


DEFINITION The inner product of f (x) and g(x), and the length squared of f(x), are 


20 


(a= | fa)g(c)dr a 7? = f CO ae. 3) 
0 0 


The interval {0, 27] where the functions are defined could change to a different interval 
like [0, 1] or (—co, 00). We chose 27 because our first examples are sin x and cos z. 


Example 2 The length of f(x) = sin z comes from its inner product with itself: 


20 
(f,f) = f (sin x)? dz =r. The length of sin z is yr. 
0 


That is a standard integral in calculus—not part of linear algebra. By writing sin? x as 
i z 5 cos 2%, we see it go above and below its average value E. Multiply that average by 
the interval length 27 to get the answer 7. 


More important: sin x and cos x are orthogonal in function space: (f,g) = 0 


2T 2T 
Inner product E E ie oe -4 cos 20)?" =. 
is zero 0 ‘as i i 


This zero is no accident. It is highly important to science. The orthogonality goes beyond 
the two functions sin z and cos 2, to an infinite list of sines and cosines. The list contains 
cos Ox (which is 1), sin 2, cos x, Sin 27, cos 2x, sin 32, cos 37, « « .. 


Every function in that list is orthogonal to every other function in the list. 


Fourier Series 


The Fourier series of a function f(x) is its expansion into sines and cosines: 
f(x) = ao + aı cos æ + bı sin x + az cos 2g + ba sin 2g +- . (5) 


We have an orthogonal basis! The vectors in “function space” are combinations of the sines 
and cosines. On the interval from z = 27 to x = 47, all our functions repeat what they did 
from 0 to 27. They are “periodic.” The distance between repetitions is the period 27. 
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Remember: The list is infinite. The Fourier series is an infinite series. We avoided 
the vector v = (1,1,1,. . .) because its length is infinite, now we avoid a function like 
5 + cos x + cos 2x + cos 3x +--+. (Note: This is 7 times the famous delta function 6(z). 
It is an infinite “spike” above a single point. At x = 0 its height 5 +1+1+--- is infinite. 
At all points inside 0 < x < 27 the series adds in some average way to zero.) The integral 
of ô(x) is 1. But f 6?(x) = oo, so delta functions are not allowed into Hilbert space. 

Compute the length of a typical sum f(z): 


27 
Ga} (ag + a; cos z + bı sin z + az cos 2x + --- )? dz 
0 


27 
2 eS. . 
=| (ag + aj cos £ +b? sin? z + a3 cos? 27 +--+) dx 
0 


|| f ||? = 2ra? + w(a? +b? +03 +---). (6) 


The step from line 1 to line 2 used orthogonality. All products like cos x cos 2z integrate to 
give zero. Line 2 contains what is left—the integrals of each sine and cosine squared. Line 
3 evaluates those integrals. (The integral of 1? is 277, when all other integrals give r.) If we 
divide by their lengths, our functions become orthonormal: 


l1 cosx sing cos2z 
Vin’ Va’ fn? fr” 
These are unit vectors. We could combine them with coefficients Ag, A1, B1, Ag,.. . to 
yield a function F(x). Then the 27 and the z’s drop out of the formula for length. 


Function length = vector length EP By ae Ae FAN 


. . is an orthonormal basis for our function space. 


Here is the important point, for f(x) as well as F(x). The function has finite length exactly 
when the vector of coefficients has finite length. Fourier series gives us a perfect match 
between the Hilbert spaces for functions and for vectors. The function is in L?, its Fourier 
coefficients are in 42. 


The function space contains f(x) exactly when the Hilbert space contains the vector 
v = (ao, 41, b1,.. .) of Fourier coefficients of f(x). Both must have finite length. 


Example 3 Suppose f(z) is a “square wave,” equal to 1 for0 < x < m. Then f(x) drops 
to —1 form < x < 2r. The +1 and —1 repeat forever. This f(x) is an odd function like 
the sines, and all its cosine coefficients are zero. We will find its Fourier series, containing 
only sines : 


ame l 
daua ware Fine 2 [= “= $ — | 


T 
The length of this function is 27, because at every point (f(x)? is (—1)? or (+1): 


27 27 
IZI? = J a a= J TT 


(8) 
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At x = O the sines are zero and the Fourier series gives zero. This is half way up the jump 
from —1 to +1. The Fourier series is also interesting when x = 3. At this point the square 
wave equals 1, and the sines in (8) alternate between +1 and —1: 


Formula for 7 1=<(1-5+5-c+--), (9) 
T 


Multiply by 7 to find a magical formula 4(1 — ł + i — = +--+) for that famous number. 


The Fourier Coefficients 


How do we find the a’s and b’s which multiply the cosines and sines? For a given func- 
tion f(x), we are asking for its Fourier coefficients a, and bx: 


Fourier series f(x) = ao +a; cosx + bi sin z + ag cos24+---. 


Here is the way to find aı. Multiply both sides by cos x. Then integrate from 0 to 27. 


The key is orthogonality! All integrals on the right side are zero, except for cos? z: 


27 27 

For coefficient a, i(2) cosrdz = / a, cos? dx = ray. (10) 
0 0 

Divide by 7 and you have a,. To find any other ag, multiply the Fourier series by cos kz. 

Integrate from 0 to 27. Use orthogonality, so only the integral of a, cos? kz is left. That 

integral is rag, and divide by r: 


27 27 
1 3 P 
ak = = f(x)coskxzdxz and similarly 6, = — ifi Fr) sinkadz. (11) 
thy = 


T Jo 


The exception is dg. This time we multiply by cos 0x = 1. The integral of 1 is 27: 


il 20 
Constant term ap = = J f(x): 1dr = average value of f(x). (12) 
T Jo 


I used those formulas to find the Fourier coefficients for the square wave in equation (8). 
The integral of f(x) cos kx was zero. The integral of f(x) sin kx was 4/k for odd k. 


Compare Linear Algebra in R” 


Infinite-dimensional Hilbert space is very much like the n-dimensional space R”. Suppose 
the nonzero vectors v1,..., Un are orthogonal in R”. We want to write the vector b (instead 
of the function f(x)) as a combination of those v’s: 


Finite orthogonal series b = c1U1 + Cove + +--+ CnUn. (13) 


Multiply both sides by vT. Use orthogonality, so vl v2 = 0. Only the c term is left: 


Coefficient cı vb = cvv +0+-:-+0. Therefore cı = vi b/vtvi. (14) 


The denominator v/v, is the length squared, like m in equation (11). The numerator 


v7 b is the inner product like f f(x) coskx dz. Coefficients are easy to find when the 
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basis vectors are orthogonal. We are just doing one-dimensional projections, to find the 
components along each basis vector. 

The formulas are even better when the vectors are orthonormal. Then we have unit 
vectors in Q. The denominators vi Uk are all 1. You know cy, = vib in another form: 


Ci 


Equation for c’s c1vı +-+ CnUn =b or Vi © Un : | =b. 


Qc=b yields c= QTb. Row by row this is ck = qi b. 


Fourier series is like having a matrix with infinitely many orthogonal columns. Those 
columns are the basis functions 1, cosz,sinz,.... After dividing by their lengths we have 
an “infinite orthogonal matrix.” Its inverse is its transpose, QT. Orthogonality is what 
reduces a series of terms to one single term, when we integrate. 


Problem Set 10.5 


1 Integrate the trig identity 2 cos jæ cos kz = cos(j + k)x + cos(j — k)x to show that 
cos jx is orthogonal to cos kx, provided j Æ k. What is the result when j = k? 


2 Show that 1, x, and r? = 5 are orthogonal, when the integration is from x = —1 to 
x = 1. Write f(x) = 2x7 as a combination of those orthogonal functions. 


3 Find a vector (w1, w2, w3, . . .) that is orthogonal to v = (1, E, L, .. -). Compute its 
length ||w ||. 
4 The first three Legendre polynomials are 1, x, and z? — i, Choose c so that the fourth 


polynomial x? — cz is orthogonal to the first three. All integrals go from —1 to 1. 
5 For the square wave f(x) in Example 3 jumping from 1 to —1, show that 


27 27 2m 
f(x) cosg dge = 0 f(x)sing dge = 4 f(x)sin 2z dz = 0. 
0 0 0 


Which three Fourier coefficients come from those integrals? 
6 The square wave has || ||? = 27. Then (6) gives what remarkable sum for 12? 


7 Graph the square wave. Then graph by hand the sum of two sine terms in its series, 
or graph by machine the sum of 2, 3, and 10 terms. The famous Gibbs phenomenon 
is the oscillation that overshoots the jump (this doesn’t die down with more terms). 


8 Find the lengths of these vectors in Hilbert space: 


(a) v = (Spade) 
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10 


11 


12 


13 


(b) v = (1,a,a7,...) 
(c) f(x) = 1 sin 


Compute the Fourier coefficients a, and bẹ for f(x) defined from 0 to 27: 


(a) pe) = 1 fort: <z <r, fe) = Ofor r < g< 2r 
(b) f(a) =a. 
When f(x) has period 27, why is its integral from —7 to 7 the same as from 0 to 


27? If f(x) is an odd function, f(—x) = — f(x), show that f7 f(x) da is zero. 
Odd functions only have sine terms, even functions only have cosines. 


Using trigonometric identities find the two terms in the Fourier series for f (x): 
(a) f(x) =cos*x (b) f(z) =cos(x+%) (c) f(z) =sin® x 


The functions 1, cos x,sin x, cos 2%, sin 2x, ... are a basis for Hilbert space. Write 
the derivatives of those first five functions as combinations of the same five functions. 
What is the 5 by 5 “differentiation matrix” for these functions? 


Find the Fourier coefficients a; and by, of the square pulse F (x) centered at x = 0: 
F(x) = 1/h for |x| < h/2 and F(x) = 0 for h/2 < |z| < r. 


As h — 0, this F(x) approaches a delta function. Find the limits of a, and bg. 


Section 4.1 of Computational Science and Engineering explains the sine series, 
cosine series, complete series, and complex series © cket? on math.mit.edu/cse. 


Section 9.3 of this book explains the Discrete Fourier Transform. This is “Fourier 
series for vectors” and it is computed by the Fast Fourier Transform. That fast 
algorithm comes quickly from special complex numbers z = et? = cos + isin@ 
when the angle is 0 = 27k/n. 
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10.6 Computer Graphics 


Computer graphics deals with images. The images are moved around. Their scale is changed. 
Three dimensions are projected onto two dimensions. All the main operations are done by 
matrices—but the shape of these matrices is surprising. 

The transformations of three-dimensional space are done with 4 by 4 matrices. You 
would expect 3 by 3. The reason for the change is that one of the four key operations 
cannot be done with a 3 by 3 matrix multiplication. Here are the four operations: 


Translation (shift the origin to another point Py = (9, yo, zo)) 
Rescaling (by cin all directions or by different factors c1, c2, c3) 

Rotation (around an axis through the origin or an axis through Po) 
Projection (onto a plane through the origin or a plane through Po). 
Translation is the easiest—just add (xo, yo, Zo) to every point. But this is not linear! No 3 
by 3 matrix can move the origin. So we change the coordinates of the origin to (0,0, 0, 1). 


This is why the matrices are 4 by 4. The “homogeneous coordinates” of the point (x,y, z) 
are (x,y,z, 1) and we now show how they work. 


1. Translation Shift the whole three-dimensional space along the vector vo. The origin 
moves to (9, Yo, Zo). This vector vo is added to every point v in R’. Using homogeneous 
coordinates, the 4 by 4 matrix T shifts the whole space by vo: 


Translation matrix oe ae 


Important: Computer graphics works with row vectors. We have row times matrix instead 
of matrix times column. You can quickly check that [0 0 0 1} T = [xo yo zo 1). 

To move the points (0,0,0) and (z, y, z) by vo, change to homogeneous coordinates 
(0,0,0,1) and (a, y, z,1). Then multiply by T. A row vector times T gives a row vector. 
Every v moves to v + vo: [x y z JT = [x+20 yt+yo z+20 1. 

The output tells where any v will move. (It goes to v +vo.) Translation is now achieved 
by a matrix, which was impossible in R°. 


2. Scaling To make a picture fit a page, we change its width and height. A copier 
will rescale a figure by 90%. In linear algebra, we multiply by .9 times the identity matrix. 
That matrix is normally 2 by 2 for a plane and 3 by 3 for a solid. In computer graphics, 
with homogeneous coordinates, the matrix is one size larger: 


9 ev 0 0 

; : 0c 00 

 Rescale the plane: S = ERE N Rescale a solid: S = 00 ¢ 0 
0001 
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Important: S' is not cI. We keep the “1” in the lower corner. Then [2,y, 1] times S is the 
correct answer in homogeneous coordinates. The origin stays in its normal position because 
wos = [001]. 

If we change that 1 to c, the result is strange. The point (cx, cy,cz,c) is the same 
as (x,y, Z,1). The special property of homogeneous coordinates is that multiplying by cI 
does not move the point. The origin in R? has homogeneous coordinates (0, 0,0, 1) and 
(0, 0, 0, c) for every nonzero c. This is the idea behind the word “homogeneous.” 

Scaling can be different in different directions. To fit a full-page picture onto a half- 
page, scale the y direction by > To create a margin, scale the x direction by 2, The 
graphics matrix is diagonal but not 2 by 2. It is 3 by 3 to rescale a plane and 4 by 4 to 
rescale a space: 


Cy 


Scaling matrices S = i and S= i 
3 


That last matrix S rescales the x, y, z directions by positive numbers c1, C2, c3. The extra 
column in all these matrices leaves the extra 1 at the end of every vector. 


Summary The scaling matrix S is the same size as the translation matrix T. They can 
be multiplied. To translate and then rescale, multiply vT S. To rescale and then translate, 
multiply vST’. Are those different? Yes. 


The point (x,y,z) in R has homogeneous coordinates (x,y,z, 1) in P’. This “pro- 
jective space” is not the same as R4. It is still three-dimensional. To achieve such a thing, 
(cx, cy, cz, c) is the same point as (x, y, z, 1). Those points of projective space P? are really 
lines through the origin in Rî. 

Computer graphics uses affine transformations, linear plus shift. An affine transforma- 
tion T is executed on P’ by a 4 by 4 matrix with a special fourth column: 


a11 @2 a3 0 ILOON 

A o a21 Q22 423 0 AL T (0, ib 0) 0 
E a31 432 433 0 E T (0, 0, 1) 0 
Q41 G42 443 | T (0, 0, 0) | 


The usual 3 by 3 matrix tells us three outputs, this tells four. The usual outputs come 
from the inputs (1, 0, 0) and (0, 1,0) and (0, 0, 1). When the transformation is linear, three 
outputs reveal everything. When the transformation is affine, the matrix also contains the 
output from (0, 0,0). Then we know the shift. 


3. Rotation A rotation in R? or R? is achieved by an orthogonal matrix Q. The determi- 
nant is +1. (With determinant —1 we get an extra reflection through a mirror.) Include the 
extra column when you use homogeneous coordinates! 


cos? —sin@ 0 
| becomes R= |sin@ cos? 0|. 
0 GE 


cos —sin@ 


Plane rotation Q= sinô  cos@ 
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This matrix rotates the plane around the origin. How would we rotate around a 
different point (4,5)? The answer brings out the beauty of homogeneous coordinates. 
Translate (4,5) to (0,0), then rotate by 0, then translate (0,0) back to (4,5): 


1 0 0f |cosð -sin O}| {1 00 
v T-RT} = E y 1] 0O 1 0j [sinf cos 0| JO 1 0 
—4 -5 1 0 0 1} |}4 5 1 


I won’t multiply. The point is to apply the matrices one at a time: v translates to vT_, then 
rotates to vT_ R, and translates back to vT- RT}. Because each point E y 1 ] is a row 
vector, T_ acts first. The center of rotation (4, 5)—otherwise known as (4, 5, 1)—moves 
first to (0,0, 1). Rotation doesn’t change it. Then T} moves it back to (4,5,1). All as it 
should be. The point (4, 6, 1) moves to (0, 1, 1), then turns by @ and moves back. 

In three dimensions, every rotation Q turns around an axis. The axis doesn’t move—it 
is a line of eigenvectors with A = 1. Suppose the axis is in the z direction. The 1 in Q is to 
leave the z axis alone, the extra 1 in RF is to leave the origin alone: 


cos —sin@ 0 Q : 

Q= (sing cos? 0 and R= 0 
0 0 1 

0 0 0 1 


Now suppose the rotation is around the unit vector a = (a1, a2,a3). With this axis a, the 
rotation matrix Q which fits into R has three parts: 


2 
ay Q142 Q143 0 a3 —ag 
Q = (cos0)I + (1 — cos 0) | aaz a% azaz| —sin@ |—a3 0 al. 0 
a143 a203 az aj —a, 0 


The axis doesn’t move because aQ = a. When a = (0,0, 1) is in the z direction, this Q 
becomes the previous ()—for rotation around the z axis. 

The linear transformation Q always goes in the upper left block of R. Below it we see 
zeros, because rotation leaves the origin in place. When those are not zeros, the transfor- 
mation is affine and the origin moves. 


4. Projection In a linear algebra course, most planes go through the origin. In real life, 
most don’t. A plane through the origin is a vector space. The other planes are affine spaces, 
sometimes called “flats.” An affine space is what comes from translating a vector space. 

We want to project three-dimensional vectors onto planes. Start with a plane through 
the origin, whose unit normal vector is n. (We will keep n as a column vector.) The 
vectors in the plane satisfy n™v = 0. The usual projection onto the plane is the matrix 
I — nn’. To project a vector, multiply by this matrix. The vector n is projected to zero, 
and the in-plane vectors v are projected onto themselves: 


(I—nn™)n=n—n(n'n)=0 and (I—nn™)v =v—n(n'v) =v. 


' 
i 
| 
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In homogeneous coordinates the projection matrix becomes 4 by 4 (but the origin doesn’t 
move): 


0 

See 
Projection onto the plane nv =0 PE I- nn 
ooro i 


Now project onto a plane nT (v — vo) = 0 that does not go through the origin. One point 
on the plane is vo. This is an affine space (or a flat). It is like the solutions to Av = b 
when the right side is not zero. One particular solution vo is added to the nullspace—to 
produce a flat. 

The projection onto the flat has three steps. Translate vo to the origin by T_. Project 
along the n direction, and translate back along the row vector vo: 


ee i 
Projection onto a flat TERI G= K il k $ G i È i ! 
—v0 0 


I can’t help noticing that 7T_ and T}, are inverse matrices: translate and translate back. They 
are like the elementary matrices of Chapter 2. 


The exercises will include reflection matrices, also known as mirror matrices. These 
are the fifth type needed in computer graphics. A reflection moves each point twice as far 
as a projection—the reflection goes through the plane and out the other side. So change 
the projection I — nn? to I — 2nn" for a mirror matrix. 

The matrix P gave a “parallel” projection. All points move parallel to n, until they 
reach the plane. The other choice in computer graphics is a “perspective” projection. This 
is more popular because it includes foreshortening. With perspective, an object looks larger 
as it moves closer. Instead of staying parallel to n (and parallel to each other), the lines of 
projection come toward the eye—the center of projection. This is how we perceive depth 
in a two-dimensional photograph. 


The basic problem of computer graphics starts with a scene and a viewing position. Ideally, 
the image on the screen is what the viewer would see. The simplest image assigns just one 
bit to every small picture element—called a pixel. It is light or dark. This gives a black 
and white picture with no shading. You would not approve. In practice, we assign shading 
levels between 0 and 28 for three colors like red, green, and blue. That means 8 x 3 = 24 
bits for each pixel. Multiply by the number of pixels, and a lot of memory is needed! 

Physically, a raster frame buffer directs the electron beam. It scans like a television 
set. The quality is controlled by the number of pixels and the number of bits per pixel. 
In this area, the standard text is Computer Graphics: Principles and Practice by Hughes, 
Van Dam, McGuire, Skylar, Foley, Feiner, and Akeley (3rd edition, Addison-Wesley, 2014). 
Notes by Ronald Goldman and by Tony DeRose were excellent references. 
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= REVIEW OF THE KEY IDEAS = 


1. Computer graphics needs shift operations T (v) = v + vo as well as linear operations 
T(v) = Av. 


2. A shift in R” can be executed by a matrix of order n + 1, using homogeneous coor- 
dinates. 


3. The extra component 1 in [æ y z 1] is preserved when all matrices have the numbers 
0,0, 0, 1 as last column. 


Problem Set 10.6 


1 A typical point in Rĉ is ci+yj+zk. The coordinate vectors i, j, and k are (1:070), 
(0,1,0), (0,0,1). The coordinates of the point are (x, y, z). 


This point in computer graphics is x2 + y7 + zk + origin. Its homogeneous coordi- 


nates are( , , , ). Other coordinates for the same point are ( , , , ). 
2 A linear transformation T is determined when we know T(z), T(j),7'(k). For an 
affine transformation we also need T’( ). The input point (x, y, z, 1) is trans- 


formed to xT (i) + yT (7) + 2T(k) + 


3 Multiply the 4 by 4 matrix T for translation along (1, 4,3) and the matrix T} for 
translation along (0, 2,5). The product TT; is translation along 


4 Write down the 4 by 4 matrix S that scales by a constant c. Multiply ST and also 
TS, where T is translation by (1,4,3). To blow up the picture around the center 
point (1, 4,3), would you use vST or vT S? 


5 What scaling matrix S (in homogeneous coordinates, so 3 by 3) would produce a 
1 by 1 square page from a standard 8.5 by 11 page? 


6 What 4 by 4 matrix would move a corner of a cube to the origin and then multiply 
all lengths by 2? The corner of the cube is originally at (1, 1, 2). 


7 When the three matrices in equation 1 multiply the unit vector a, show that they give 
(cos 0)a and (1 — cos ĝ)a and 0. Addition gives aQ = a and the rotation axis is not 
moved. 


8 If b is perpendicular to a, multiply by the three matrices in 1 to get (cos 0)b and O 
and a vector perpendicular to b. So Qb makes an angle 0 with b. This is rotation. 


9 What is the 3 by 3 projection matrix I — nn? onto the plane Zr + Zy + iz = 0? In 
homogeneous coordinates add 0, 0,0, 1 as an extra row and column in P. 
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10 


11 
12 


13 


14 


15 


16 


17 


With the same 4 by 4 matrix P, multiply T_ PT’, to find the projection matrix onto 
the plane Ze + zy + iz = 1. The translation 7_— moves a point on that plane (choose 
one) to (0,0,0, 1). The inverse matrix T} moves it back. 


Project (3, 3, 3) onto those planes. Use P in Problem 9 and T_ PT4 in Problem 10. 
If you project a square onto a plane, what shape do you get? 


If you project a cube onto a plane, what is the outline of the projection? Make the 
projection plane perpendicular to a diagonal of the cube. 


The 3 by 3 mirror matrix that reflects through the plane nTw = Ois M = I —2nn". 
Find the reflection of the point (3,3, 3) in the plane 22 + ły + 4z = 0. 


Find the reflection of (3,3,3) in the plane žr + y + iz = 1. Take three steps 
T_ MT using 4 by 4 matrices: translate by T— so the plane goes through the origin, 
reflect the translated point (3, 3,3, 1)T— in that plane, then translate back by T4. 


The vector between the origin (0,0,0,1) and the point (z, y, z,1) is the difference 
v = . In homogeneous coordinates, vectors end in . So we adda 
to a point, not a point to a point. 


If you multiply only the last coordinate of each point to get (x, y, z, €), you rescale 
the whole space by the number . This is because the point (x, y, z,c) is the 
sameas( , , 4 1), 
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10.7 Linear Algebra for Cryptography 


a a a aaa Ra aaa 7 
/1 Codes can use finite fields as alphabets: letters in the message become numbers 0,1,..., p — 1) 


2 The numbers are added and multiplied (mod p). Divide by p, keep the remainder. 


3 A Hill Cipher multiplies blocks of the message by a secret matrix E (mod p). 


4 To decode, multiply each block by the inverse matrix D (mod p). Not a very secure cipher! 


Cryptography is about encoding and decoding messages. Banks do this all the time 
with financial information. Amazingly, modern algorithms can involve extremely deep 
mathematics. “Elliptic curves” play a part in cryptography, as they did in the sensational 
proof by Andrew Wiles of Fermat’s Last Theorem. 

This section will not go that far! But it will be our first experience with finite fields and 
finite vector spaces. The field for R” contains all real numbers. The field for “modular 
arithmetic” contains only p integers 0, 1,...,p — 1. There were infinitely many vectors in 
R”—now there will only be p” messages of length n in message space. The alphabet from 
A to Z is finite (as in p = 26). 


The codes in this section will be easily breakable—they are much too simple for prac- 
tical security. The power of computers demands more complex cryptography, because that 
power would quickly detect a small encoding matrix. But a matrix code (the Hill Cipher) 
will allow us to see linear algebra at work in a new way. 

All our calculations in encoding and decoding will be “mod p”. But the central con- 
cepts of linear independence and bases and inverse matrices and determinants survive this 
change. We will be doing “linear algebra with finite fields”. Here is the meaning of mod p: 


27 = 2 (mod 5) means that 27 — 2 is divisible by 5 


= z (mod p) means that y — z is divisible by p 


Dividing y by 5 produces one of the five possible remainders x = 0, 1, 2,3, 4. All the num- 
bers 5,—5,10,—10,... with no remainder are congruent to zero (mod 5). The numbers 
y = 6, —4,11, —9,... are all congruent to z = 1(mod 5). 

We use the word congruent for the symbol = and we call this “modular arithmetic”. 
Every integer y produces one of the values x = 0, 1,2,...,p— 1. 

The theory is best if p is a prime number. With p = 26 letters from A to Z, we 
unfortunately don’t start with a prime p. Cryptography can deal with this problem. 
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Modular Arithmetic 
Linear algebra is based on linear combinations of vectors. Now our vectors (%1,...,2%n) 
are strings of integers limited to x = 0,1,...,p—1. All calculations produce these integers 


when we work “mod p”. This means: Every integer y outside that range is divided by p 
and x is the remainder: 


y=qpt+ez y = x (mod p) y divided by p has remainder x 


Addition mod 3 10 = 1 (mod 3) and 16 = 1 (mod 3) and 10 + 16 = 1 + 1 (mod 3) 


I could add 10 + 16 and divide 26 by 3 to get the remainder 2. 
Or I can just add remainders 1 + 1 to reach the same answer 2. 


Addition mod 2 11 = 1 (mod 2) and 17 = 1 (mod 2) and 11 + 17 = 28 = 0 (mod 2) 
The remainders added to 1 + 1 but this is not 2. The final step was 2 = 0 (mod 2). 


Addition mod p is completely reasonable. So is multiplication mod p. Here p = 3: 
10 = 1 (mod 3) times 16 = 1 (mod 3) gives 1 times 1 = 1 160 = 1 (mod 3) 
5 = 2 (mod 3) times 8 = 2 (mod 3) gives 2 times 2 = 1 40 = 1 (mod 3) 


Conclusion: We can safely add and multiply modulo p. So we can take linear combinations. 
This is the key operation in linear algebra. But can we divide ? 

In the real number field, the inverse is 1/y (for any number except y = 0). This means: 
We found another real number z so that yz = 1. Invertibility is a requirement for a field. 
Is inversion always possible mod p? For every number y = 1,...,p — 1 can we find 
another number z = 1,...,p — 1 so that yz = 1 mod p? 

The examples 37t = 4 (mod 11) and 27t = 6 (mod 11) and 57! = 9 (mod 11) all 
succeed. Can you solve 7z = 1 (mod 11)? Inverting numbers will be the key to inverting 
matrices. 


Let me show that inversion mod p has a problem when p is not a prime number. The 
example p = 26 factors into 2 times 13. Then y = 2 cannot have an inverse z (mod 26). 
The requirement 2z = 1 (mod 26) is impossible to satisfy because 2z and 26 are even. 

Similarly 5 has no inverse z when p is 25. We can’t solve 5z = 1(mod 25). The 
number 5z — 1 is never going to be a multiple of 5, so it can’t be a multiple of 25. 


Inversion of every y (0 < y < p) will be possible if and only if p is prime. 


Inversion needs y, 2y, 3y,..., py to have different remainders when divided by p. 
If my and ny had the same remainder z then (m — n)y would be divisible by p. 
The prime number p would have to divide either m — n or y. Both are impossible. 


So y,..., py have different remainders: One of those remainders must be x = 1. 
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The Enigma Machine and the Hill Cipher 


Lester Hill published his cipher (his system for encoding and decoding) in the American 
Mathematical Monthly (1929). The idea was simple, but in some way it started the transi- 
tion of cryptography from linguistics to mathematics. Codes up to that time mainly mixed 
up alphabets and rearranged messages. The Enigma code used by the German Navy in 
World War II was a giant advance—using machines that look to us like primitive comput- 
ers. The English set up Bletchley Park to break Enigma. They hired puzzle solvers and 
language majors. And by good luck they also happened to get Alan Turing. 

I don’t know if you have seen the movie about him: The Imitation Game. A lot of 
it is unrealistic (like Good Will Hunting and A Beautiful Mind at MIT). But the core idea 
of breaking the Enigma code was correct, using human weaknesses in the encoding and 
broadcasting. The German naval command openly sent out their coded orders—knowing 
that the codes were too complicated to break (if it hadn’t been for those weaknesses). 
The codebreaking required English electronics to undo the German electronics. It also 
required genius. 

Alan Turing was surely a genius—England’s most exceptional mathematician. His 
life was ultimately tragic and he ended it in 1954. The biography by Andrew Hodges 
is excellent. Turing arrived at Bletchley Park the day after Poland was invaded. It is to 
Winston Churchill’s credit that he gave fast and full support when his support was needed. 

The Enigma Machine had gears and wheels. The Hill Cipher only needs a matrix. That 
is the code to be explained now, using linear algebra. You will see how decoding involved 
inverse matrices. All steps use modular arithmetic, multiplying and inverting mod p. 

I will follow the neat exposition of Professor Spickler of Salisbury State University, 
which he made available on the Web: facultyfp.salisbury.edu/despickler/personal/index.asp 


Modular Arithmetic with Matrices 


Addition, subtraction, and multiplication are all we need for Ax (matrix times vector). 
To multiply mod p we can multiply the integers in A times the integers in æ as usual— 
and then replace every entry of Ax by its value mod p. 


Key questions: When can we solve Ax = b (mod p)? Do we still have the four subspaces 
C(A), N(A),C(A™), N(AT)? Are they still orthogonal in pairs? Is there still an in- 
verse matrix mod p whenever the determinant of A is nonzero mod p? I am happy to say 
that the last three answers are yes (but the inverse question requires p to be a prime number). 


We can find A~! (mod p) by Gauss-Jordan elimination, reducing [A I] to [Z A~+] 
as in Section 2.5. Or we can use determinants and the cofactor matrix C’ in the formula 
A`! = (det A)~! CT. I will work mod 3 with a 2 by 2 integer matrix A: 


20 10 2° iO multiply row 1 1020 
[A I]= as = ce => =[I A+] 
2101 O12 1 by 20 =2 01-2 1 


10.7. Linear Algebra for Cryptography 505 


By pure chance A~+ = A! Multiplying A times A mod 3 does give the identity matrix: 


enaet=[2°][22]=[4 tefi Jes 


The determinant of A is 2, and the cofactor formula from Section 5.3 also gives AT SA: 


opeje 2] I] ea 


Theorem. A~! exists mod p if and only if (det A)~+ exists mod p. 
The requirement is: det A and p have no common factors. 


Encryption with the Hill Cipher 


The original cipher used the letters A to Z with p = 26. Hill chose an n by n encryption 
matrix E so that det F is not divisible by 2 or 13. Then the number det E has an inverse 
mod 26 and so does the matrix Æ. The inverse matrix E~! = D (mod 26) will be the 
decryption matrix that decodes the message. 

Now convert each letter of the message into a number from 0 to 25. The obvious choice 
from A = 0 to Z = 25 is acceptable because the matrix will make this cipher stronger. 


Ignore spaces and divide the message into blocks v1,v2,... of size n. 
Then multiply each message block (mod p) by the encryption matrix E. 
The coded message is Evi, Evoa, ... and you know what the decoder will do. 


—1 


Spikler’s example has D = E~! = 2 3 15 E 10 19 16 
det E = 583 = 11 (mod 26) 5 8 12| =| 4 23 7 | (mod 26). 
z i 13 4 17 5 19 


Of course a codebreaker will not know F or D. And the block size n is generally 
unknown too. For the matrices Hill had in mind n would not be very large and a computer 
could quickly discover F and D. 

I am not sure if Hill ’s Cipher could become seriously difficulty to break by choosing 
very large matrices and a large prime number p. And by encoding the coded message a 
second time, using a different block size nz and large matrix E» and large prime po. 


Finite Fields and Finite Vector Spaces 


In algebra, a field F is a set of scalars that can be added and multiplied and inverted 
(except 0 can’t be inverted). Familiar examples are the real numbers R and the complex 
numbers C and the rational numbers Q (containing every ratio p/q of integers). From a 
field you build vectors v = (f1, fo,..., fn). From linear combinations of vectors you 
build vector spaces. So linear algebra begins with a field F. 

I taught for ten years from a textbook that started with fields. On the way to R”, we 
lost a lot of students. That was a signal—the emphasis was misplaced if we wanted the 
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course to be useful. I believe the right way is to understand R” and its subspaces first, 
as you do. Then you can look at other fields and vector spaces with a natural question 
in mind: What is new when the field is not R? 

These pages are asking that question for finite fields. The possibilities become more 
limited but also highly interesting. The starting point (and not quite the ending point) is 
the finite field F,,. It contains only the numbers 0,1,..., — 1 and p is a prime number. 
I will focus first on the field F2 with only 2 members “0” and “1”. You could think of O 
and 1 as “even” and “odd” because the rules to add and multiply are obeyed by the even 
numbers and odd numbers: even + odd = odd and even X odd = even. 


0 1 0 1 
Addition 0|0 1 Multiplication 0/0 0 
table 1/1 0 table 1/0 1 


This is addition and multiplication “mod 2”. 

From this field Fz we can build vectors like v = (0,0, 1) and w = (1,0, 1). There are 
three components with two choices each: a total of 23 = 8 different vectors in the vector 
space (F2)°. You know the requirements on a subspace and the possibilities it opens up: 


a) The zero-dimensional subspace containing only 0 = (0,0, 0). 
b) One-dimensional subspaces containing 0 and a vector like v. Notice v + v = 0! 
c) Two-dimensional subspaces with a basis like v and w and 4 vectors 0, v, w, v + w. 


d) The full three-dimensional subspace (F2)? with 8 vectors. 


What are the possible bases for (F2) ? The standard basis contains (1,0,0) and (0, 1, 0) 
and (0,0,1). Those vectors are linearly independent and they span (F2)°. Their eight 
combinations with coefficients 0 and 1 fill all of (F2)*. 

What about matrices that multiply those vectors? The matrices will be 1 by 3, or 2 by 
3, or 3 by 3. When they are 3 by 3 we can ask if they are invertible. Their determinants 
can only be 0 (singular matrix) or 1 (invertible matrix). Let me leave you the pleasure of 
deciding whether these matrices are invertible. And how would you find the inverse ? 


1 0 0 1 1 0 1 1 1 
A=|1 10 B=]|0 1 1 C=]0 01 
1 1 1 1 0 1 1 0 0 


Out of 2° possible matrices over F3, I will guess that most are singular. 


To conclude this discussion of F2, I mention a field with 2? = 4 members. It will not 
come from multiplication (mod 4), because 4 is not prime. The multiplication 2 times 2 
will give 0 (and 2 has no inverse): not a field. But we can start with the numbers 0 and 1 
in F> and invent two more numbers a and 1 + a—provided they follow these two rules: 
(a + a=0)and (a xX a = 1 + a). Then a and 1 + a are inverses. Not obvious! 
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il a me! Multiply 


Beyond p = 2, we have the fields F, for all prime numbers p. They use addition and 


multiplication mod p. They are alphabets for codes. They provide the components for 
vectors v = (f1,..., fn) in the space (F,)". They provide the entries for matrices that 
multiply those vectors. These fields F, are the most frequently used finite fields. 


The only other finite fields have p” members. The example above of 0,1,a,1 +a 


had 2? = 4 members. We will leave it there and get back safely to R. 


Problem Set 10.7 


1 


O N O A 


10 


If you multiply n whole numbers (even or odd) when is the answer odd? Translate 
into multiplication (mod 2): If you multiply 0’s and 1’s when is the answer 1 ? 


If you add n whole numbers (even or odd) when is the sum of the numbers odd? 
Translate into adding 0’s and 1’s (mod 2). When do they add to 1? 


(a) If yı = xı and y2 = Za, why is yı + yo = z1 + LQ? All are mod p. 
Suggestion: yy = p qı + xı and y2 = p q2 + £2. Now add yı + y2. 


(b) Can you be sure that zı + x2 is smaller than p? No. Give an example where 
there is a smaller x with (y1 + y2) = z (mod p). 


p = 39 is not prime. Find a number a that has no inverse z (mod 39). This means 
that az = 1 (mod 39) has no solution. Then find a 2 by 2 matrix A that has no 
inverse matrix Z (mod 39). This means that AZ = I (mod 39) has no solution. 


Show that y = x (mod p) leads to —y = —x (mod p). 
Find a matrix that has independent columns in R” but dependent columns (mod 5). 
What are all the 2 by 2 matrices of 0’s and 1’s that are invertible (mod 2) ? 


Is the row space of A still orthogonal to the nullspace in modular arithmetic (mod 11) ? 
Are bases for those subspaces still bases (mod 11) ? 


(Hill’s Cipher) Separate the message THISWHOLEBOOKISINCODE into blocks 
of 3 letters. Replace each letter by a number from 1 to 26 (normal order). Multiply 
each block by the 3 by 3 matrix L with 1’s on and below the diagonal. What is the 
coded message (in numbers) and how would you decode it? 


Suppose you know the original message (the plaintext). Suppose you also see the 
coded message. How would you start to discover the matrix in Hill’s Cipher? For a 
very long message do you expect success ? 


Chapter 11 


Numerical Linear Algebra 


The goals of numerical linear algebra are speed and accuracy and stability : n > 10° or 10°. 
Matrices can be full or sparse or banded or structured: special algorithms for each. 

Accuracy of elimination is controlled by the condition number || A|| || A~+||. 

Gram-Schmidt is often computed by using Householder reflections H = I — 2uu" to find Q. 
Eigenvalues use QR iterations Ao = Qo Rp > RoQo = A1 = Q1R1 > > An. 

Shifted QR is even better: Shift to A, — ckI = Qk Rx, shift back Api = RkQk + cel. 
Iteration S£k+1 = b — Tgp solves (S+T)x = b if all eigenvalues of STIT have |A| < 1. 


Iterative methods often use preconditioners P. Change Ax = b to PAx = Pb with PA > I. 


Conjugate gradients and GMRES are Krylov methods; see Trefethen-Bau (and other texts). 


11.1 Gaussian Elimination in Practice 


Numerical linear algebra is a struggle for quick solutions and also accurate solutions. We 
need efficiency but we have to avoid instability. In Gaussian elimination, the main freedom 
(always available) is to exchange equations. This section explains when to exchange rows 
for the sake of speed, and when to do it for the sake of accuracy. 

The key to accuracy is to avoid unnecessarily large numbers. Often that requires us to 
avoid small numbers! A small pivot generally means large multipliers (since we divide by 
the pivot). A good plan is “partial pivoting”, to choose the largest available pivot in each 
new column. We will see why this pivoting strategy is built into computer programs. 

Other row exchanges are done to save elimination steps. In practice, most large matrices 
are sparse—almost all entries are zeros. Elimination is fastest when the equations are 
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ordered to produce a narrow band of nonzeros. Zeros inside the band “fill in” during 
elimination—those zeros are destroyed and don’t save computing time. 

Section 11.2 is about instability that can’t be avoided. It is built into the problem, 
and this sensitivity is measured by the “condition number’. Then Section 11.3 describes 
how to solve Ax = b by iterations. Instead of direct elimination, the computer solves 
an easier equation many times. Each answer £w leads to the next guess 2,41. For good 
iterations (the conjugate gradient method is extremely good), the x; converge quickly to 
e=A'Dd. 


The Fastest Supercomputer 


A new supercomputing record was announced by IBM and Los Alamos on May 20, 2008. 
The Roadrunner was the first to achieve a quadrillion (10!°) floating-point operations per 
second: a petaflop machine. The benchmark for this world record was a large dense linear 
system Ax = b: computer speed is tested by linear algebra. 

That machine was shut down in 2013! The TOPS500 project ranks the 500 most powerful 
computer systems in the world. As I write this page in October 2015, the first four are from 
NUDT in China, Cray and IBM in the US, and Fujitsu in Japan. They all use a LINUX- 
based system. And all vector processors have fallen out of the top 500. 

Looking ahead, the Summit is expected to take first place with 150-300 petaflops. 
President Obama has just ordered the development of an exascale system (1000 petaflops). 
Up to now we are following Moore’s Law of doubling every 14 months. 

The LAPACK software does elimination with partial pivoting. The biggest difference 
from this book is to organize the steps to use large submatrices and never single numbers. 
And graphics processing units (GPU’s) are now almost required for success. The market for 
video games dwarfs scientific computing and led to astonishing acceleration in the chips. 

Before IBM’s BlueGene, a key issue was to count the standard quad-core processors 
that a petaflop machine would need: 32,000. The new architecture uses much less power, 
but its hybrid design has a price: a code needs three separate compilers and explicit instruc- 
tions to move all the data. Please see the excellent article in SIAM News (siam.org, July 
2008) and the update on www.lanl.gov/roadrunner. 


Our thinking about matrix calculations is reflected in the highly optimized BLAS 
(Basic Linear Algebra Subroutines). They come at levels 1, 2, and 3: 


Level 1 Linear combinations of vectors au + v: O(n) work 
Level 2 Matrix-vector multiplications Au + v: O(n?) work 


Level 3 Matrix-matrix multiplications AB + C: O(n?) work 


Level 1 is an elimination step (multiply row 7 by £;; and subtract from row 2). Level 2 
can eliminate a whole column at once. A high performance solver is rich in Level 3 BLAS 
(AB has 2n° flops and 2n? data, a good ratio of work to talk). 

It is data passing and storage retrieval that limit the speed of parallel processing. The 
high-velocity cache between main memory and floating-point computation has to be fully 
used! Top speed demands a block matrix approach to elimination. 

The big change, coming now, is parallel processing at the chip level. 
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Roundoff Error and Partial Pivoting 


Up to now, any pivot (nonzero of course) was accepted. In practice a small pivot is danger- 
ous. A catastrophe can occur when numbers of different sizes are added. Computers keep a 
fixed number of significant digits (say three decimals, for a very weak machine). The sum 
10,000 + 1 is rounded off to 10,000. The “1” is completely lost. Watch how that changes 
the solution to this problem: 


.000lu +v = 1 


starts with coefficient matrix A= AU s 


1 


If we accept .0001 as the pivot, elimination adds 10,000 times row 1 to row 2. Roundoff 
leaves 
10,000v = 10,000 instead of 10,001v = 10,000. 


The computed answer v = 1 is near the true v = .9999. But then back substitution puts the 
wrong v = 1 into the equation for u: 


000l uv+1=1 instead of 0001 u+ 9999 = 1. 


The first equation gives u = 0. The correct answer (look at the second equation) is u = 
1.000. By losing the “1” in the matrix, we have lost the solution. The small change from 
10,001 to 10,000 has changed the answer from u = 1 to u = 0 (100% error!). 

If we exchange rows, even this weak computer finds an answer that is correct to 3 places: 


—u+v=0 —u+v=0 u= 
000lu+v=1 ms v=1 a v=1. 
The original pivots were .0001 and 10,000—badly scaled. After a row exchange the exact 
pivots are —1 and 1.0001—well scaled. The computed pivots —1 and 1 come close to the 
exact values. Small pivots bring numerical instability, and the remedy is partial pivoting. 


Here is our strategy when we reach and search column k for the best available pivot: 
Choose the largest number in row k or below. Exchange its row with row k. 


The strategy of complete pivoting looks also in later columns for the largest pivot. It ex- 
changes columns as well as rows. This expense is seldom justified, and all major codes 
use partial pivoting. Multiplying a row or column by a scaling constant can also be very 
worthwhile. If the first equation above is u + 10,000v = 10,000 and we don’t rescale, 
then 1 looks like a good pivot and we would miss the essential row exchange. 

For positive definite matrices, row exchanges are not required. It is safe to accept 
the pivots as they appear. Small pivots can occur, but the matrix is not improved by row 
exchanges. When its condition number is high, the problem is in the matrix and not in the 
code. In this case the output is unavoidably sensitive to the input. 


The reader now understands how a computer actually solves Ax = b—by elimination 
with partial pivoting. Compared with the theoretical description—find A~* and multiply 
A~'b—the details took time. But in computer time, elimination is much faster. I believe 
that elimination is also the best approach to the algebra of row spaces and nullspaces. 
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Operation Counts: Full Matrices 


Here is a practical question about cost. How many separate operations are needed to solve 
Aa = b by elimination? This decides how large a problem we can afford. 

Look first at A, which changes gradually into U. When a multiple of row 1 is subtracted 
from row 2, we do n operations. The first is a division by the pivot, to find the multiplier £. 
For the other n — 1 entries along the row, the operation is a “multiply-subtract”. For conve- 
nience, we count this as a single operation. If you regard multiplying by £ and subtracting 
from the existing entry as two separate operations, multiply all our counts by 2. 

The matrix A is n by n. The operation count applies to all n — 1 rows below the first. 
Thus it requires n times n — 1 operations, or n? — n, to produce zeros below the first pivot. 
Check: All n? entries are changed, except the n entries in the first row. 

When elimination is down to k equations, the rows are shorter. We need only k? — k 
operations (instead of n? — n) to clear out the column below the pivot. This is true for 
1 < k < n. The last step requires no operations (7 — 1 = 0); forward elimination is 
complete. The total count to reach U is the sum of k? — k over all values of k from 1 to n: 


(ee) 1 a) = n(nt+1)(Qn+1)  n(n+1)_ mean 
6 2 3 
Those are known formulas for the sum of the first n numbers and their squares. Substituting 
n = 100 gives a million minus a hundred—then divide by 3. (That translates into one 
second on a workstation.) We will ignore n in comparison with n3, to reach our main 
conclusion: 


The multiply-subtract count is in? for forward elimination (A to U, producing L). 


That means in? multiplications and subtractions. Doubling n increases this cost by eight 


(because n is cubed). 100 equations are easy, 1000 are more expensive, 10000 dense equa- 
tions are close to impossible. We need a faster computer or a lot of zeros or a new idea. 
On the right side of the equations, the steps go much faster. We operate on single 
numbers, not whole rows. Each right side needs exactly n? operations. Down and back 
up we are solving two triangular systems, Le = b forward and Uz = c backward. In back 
substitution, the last unknown needs only division by the last pivot. The equation above 
it needs two operations—substituting £n and dividing by its pivot. The kth step needs k 
multiply-subtract operations, and the total for back substitution is 
L2 p = mnt) x in? operations. 
The forward part is similar. The n? total exactly equals the count for multiplying AT 1b! 
This leaves Gaussian elimination with two big advantages over A~!b: 


1 Elimination requires ¿n° multiply-subtracts, compared to n? for A™!. _ 


2 If A is banded so are L and U: by comparison A“? is full of nonzeros. 
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Band Matrices 


These counts are improved when A has “good zeros”. A good zero is an entry that remains 
zero in L and U. The best zeros are at the beginning of a row. They require no elimination 
steps (the multipliers are zero). So we also find those same good zeros in L. That is 
especially clear for this tridiagonal matrix A (and for band matrices in Figure 11.1): 


Tridiagonal 1 =i 1 1 —1 
Bidiagonal =L 21 a E i =] 
times =) 2 Sk) =i 1 1 —1 
bidiagonal =i 2 —1 1 


Figure 11.1: A = LU for a band matrix. Good zeros in A stay zero in L and U. 


These zeros lead to a complete change in the operation count, for “half-bandwidth” w: 
A band matrix has a;; = O when |i — j| > w. 


Thus w = 1 for a diagonal matrix, w = 2 for tridiagonal, w = n for dense. The length of 
the pivot row is at most w. There are no more than w — 1 nonzeros below any pivot. Each 
stage of elimination is complete after w(w — 1) operations, and the band structure survives. 
There are n columns to clear out. Therefore: 


Elimination on a band matrix (A to L and U) needs less than w?n operations. 


For a band matrix, the count is proportional to n instead of n°. It is also proportional to w?. 
A full matrix has w = n and we are back to nè. For an exact count, remember that the 
bandwidth drops below w in the lower right corner (not enough space): 

n(n—1)(n+1) n?—n 


w(w — 1)(38n — 2w + 1) Dense _ 

3 3 3 
On the right side of Ax = b, to find x from b, the cost is about 2wn (compared to the 
usual n?). Main point: For a band matrix the operation counts are proportional to n. 
This is extremely fast. A tridiagonal matrix of order 10,000 is very cheap, provided 
we don’t compute A~'. That inverse matrix has no zeros at all: 


Band 


n= e 4-8-9: 1 
Vai" 2 21 36 ee eee ae ae | 
A=! 09-1 2-1] Bs A =UCL=1o 9 9 1 

‘ne ee a oe s | 


We are actually worse off knowing A~* than knowing L and U. Multiplication by A`! 
needs the full n? steps. Solving Lc = b and Uz = c needs only 2wn. 
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A band structure is very common in practice, when the matrix reflects connections 
between near neighbors: a;3 = 0 and a14 = 0 because 1 is not a neighbor of 3 and 4. 
We close with counts for Gauss-Jordan and Gram-Schmidt-Householder: 


A`! costs n? multiply-subtract steps. QR costs Zn? steps. 


In AA~! = J, the jth column of A7! solves Aw; = jth column of I. The left side costs 
in’ as usual. (This is a one-time cost! L and U are not repeated.) The special saving for the 
jth column of J comes from its first 7 — 1 zeros. No work is required on the right side until 
elimination reaches row 7. The forward cost is i(n — j)? instead of sn. Summing over 
j, the total for forward elimination on the n right sides is +n. The final multiply-subtract 


6 
count for AT! is n? if we actually want the inverse: 


n? n’ n? 
ForA—+ > (Land U) + ra (forward) + n() (back substitutions) = n°. (1) 


Orthogonalization (A to Q): The key difference from elimination is that each multiplier 
is decided by a dot product. That takes n operations, where elimination just divides by 
the pivot. Then there are n “multiply-subtract” operations to remove from column k its 
projection along column j < k (see Section 4.4). The combined cost is 2n where for 
elimination it is n. This factor 2 is the price of orthogonality. We are changing a dot 
product to zero where elimination changes an entry to zero. 


Caution To judge a numerical algorithm, it is not enough to count the operations. Beyond 
“flop counting” is a study of stability (Householder wins) and the flow of data. 


Reordering Sparse Matrices 


For band matrices with constant width w, the row ordering is optimal. But for most sparse 
matrices in real computations, the width of the band is not constant and there are many 
zeros inside the band. Those zeros can fill in as elimination proceeds—they are lost. We 
need to renumber the equations to reduce fill-in, and thereby speed up elimination. 

Generally speaking, we want to move zeros to early rows and columns. Later rows 
and columns are shorter anyway. The “approximate minimum degree” algorithm in sparse 
MATLAB is greedy—uit chooses the row to eliminate without counting all the consequences. 
We may reach a nearly full matrix near the end, but the total operation count to reach LU 
is still much smaller. To find the absolute minimum of nonzeros in L and U is an NP-hard 
problem, much too expensive, and amd is a good compromise. 

Fill-in is famous when each point on a square grid is connected to its four nearest 
neighbors. It is impossible to number all the gridpoints so that neighbors stay together! If 
we number by rows of the grid, there is a long wait to come around to the gridpoint above. 
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2 
j ES i t EA l i=2 
i =9 1 O@|/—s 10 8 2 j=1 — Al 
Bb | =2 “oO. 9 02 4 k = 3 
3 
a32 = 0 a32 = 2 a32 = 0 before a32 Æ 0 after 


We only need the positions of the nonzeros, not their exact values. Think of the graph 
of nonzeros: Node i is connected to node j if a;i; # 0. Watch to see how elimination can 
create nonzeros (new edges), which we are trying to avoid. 

The command nnz(L) counts the nonzero multipliers in the lower triangular L, find 
(L) will list them, and spy(L) shows them all. 

The goal of colamd and symamd is a better ordering (permutation P) that reduces 
fill-in for AP and PT AP—by choosing the pivot with the fewest nonzeros below it. 


Fast Orthogonalization 


There are three ways to reach the important factorization A = QR. Gram-Schmidt works 
to find the orthonormal vectors in Q. Then R is upper triangular because of the order of 
Gram-Schmidt steps. Now we look at better methods (Householder and Givens), which 
use a product of specially simple Q’s that we know are orthogonal. 

Elimination gives A = LU, orthogonalization gives A = QR. We don’t want a 
triangular L, we want an orthogonal Q. L is a product of E’s from elimination, with 
1’s on the diagonal and the multiplier £;; below. Q will be a product of orthogonal matrices. 

There are two simple orthogonal matrices to take the place of the E’s. The reflection 
matrices I — 2uu™ are named after Householder. The plane rotation matrices are named 
after Givens. The simple matrix that rotates the xy plane by @ is Q21: 


cos —sind 0 
Q2 = | sin cosé@ 0 
0 Ol 


Givens rotation 
in the 1-2 plane 


Use Qa the way you used E24, to produce a zero in the (2, 1) position. That determines 
the angle 0. Bill Hager gives this example in Applied Numerical Linear Algebra: 


6 8. 0 90 —153 114 150 —155 —110 
Q214 = 28 ££ 0 120 -79 —223| = | 0 T5 —225 
0 0 1 200 —40 395 200 —40 395 


The zero came from —.8(90) + .6(120). No need to find 0, what we needed was cos 0: 


cos = sat and sin? = toe (2) 
v902 + 1202 902 + 1202 
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Now we attack the (3,1) entry. The rotation will be in rows and columns 3 and 1. The 
numbers cos @ and sin @ are determined from 150 and 200, instead of 90 and 120. 


Coeds es - Zz 250 —125 250 
Q31Q214 = Ow l af) o soolal 75 =205 
in. e 0 100 325 


One more step to R. The (3,2) entry has to go. The numbers cos @ and sin 0 now come 
from 75 and 100. The rotation is now in rows and columns 2 and 3: 


1 0 0 250 —125 - 250 —125 250 
Q32Q31Q21A = |V 6 8 o WA |= 0 125 125 
0 —.8 6 0 100 : 0 0 375 


We have reached the upper triangular R. What is Q? Move the plane rotations Q;; to the 
other side to find A = ()R—just as you moved the elimination matrices F;; to the other 
side to find A = LU: 


Q32Q31Q21A=R means A=(QzQ3, Q3,)R=QR. (3) 


The inverse of each Qij is Qi (rotation through —@). The inverse of E;; was not an 
orthogonal matrix! LU and QR are similar but L and Q are not the same. 

Householder reflections are faster than rotations because each one clears out a whole 
column below the diagonal. Watch how the first column a; of A becomes column r1 of R: 


Reflection by H; lez a 
Hı a; = or -rno 
H; = T = 2ujur : 8 ; 
0 0 


The length was not changed, and w is in the direction of aj — rı. We have n — 1 entries 
in the unit vector u; to get n — 1 zeros in rı. (Rotations had one angle 0 to get one zero.) 
When we reach column k, we have n — k available choices in the unit vector wz. 
This leads to n — k zeros in rg. We just store the u’s and r’s to know the final Q and R: 


Inverse of H; is H; (Hn-1...H1)A=R means A= (Hı... H n—1)R = QR. (5) 


This is how LAPACK improves on 19th century Gram-Schmidt. Q is exactly orthogonal. 

Section 11.3 explains how A = QR is used in the other big computation of linear 
algebra—the eigenvalue problem. The factors QR are reversed to give A; = RQ which is 
QT AQ. Since A; is similar to A, the eigenvalues are unchanged. Then A; is factored into 
Qı Rı, and reversing the factors gives Az. Amazingly, the entries below the diagonal get 
smaller in A1, Az, A3,... and we can identify the eigenvalues. This is the “QR method” 
for Ax = Az, a big success of numerical linear algebra. 
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Problem Set 11.1 


1 Find the two pivots with and without row exchange to maximize the pivot: 
.001 0 
az , oe l 


With row exchanges to maximize pivots, why are no entries of L larger than 1? 
Find a 3 by 3 matrix A with all |a;;| < 1 and |2;;| < 1 but third pivot = 4. 


2 Compute the exact inverse of the Hilbert matrix A by elimination. Then compute 
A`! again by rounding all numbers to three figures: 


Ill-conditioned matrix A = hilb(3) = 


ele wje dole 
l= Ale wle 


3 For the same A compute b = Ag for x = (1,1,1) and x = (0,6, —3.6). A small 
change Ab produces a large change Ax. 


4 Find the eigenvalues (by computer) of the 8 by 8 Hilbert matrix a;; = 1/(i+ j — 1). 
In the equation Ax = b with ||b|| = 1, how large can ||a|| be? If b has roundoff error 
less than 10716, how large an error can this cause in a? See Section 9.2. 


5 For back substitution with a band matrix (width w), show that the number of multi- 
plications to solve Ux = c is approximately wn. 


6 If you know L and U and Q and R, is it faster to solve LU x = bor QRa = b? 


7 Show that the number of multiplications to invert an upper triangular n by n matrix 
is about an’. Use back substitution on the columns of J, upward from 1’s. 


8 Choosing the largest available pivot in each column (partial pivoting), factor each A 
into PA = LU: 


1 0 1 
a=; J and A=i2 2 0 
0 2 0 


9 Put 1’s on the three central diagonals of a 4 by 4 tridiagonal matrix. Find the cofac- 
tors of the six zero entries. Those entries are nonzero in A~+. 


10 (Suggested by C. Van Loan.) Find the LU factorization and solve by elimination 
whens = 10-*,10-" 107? 107 40>": 


aalala] 


The true æ is (1,1). Make a table to show the error for each e. Exchange the two 
equations and solve again—the errors should almost disappear. 


11 


12 


13 


14 


15 


16 


17 
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(a) Choose sin 0 and cos 6 to triangularize A, and find R: 


: : = |cos@ —sinĝ| |1 —1| [x x| _ 
Givens rotation QA = ew = ; J = i i =R, 


(b) Choose sin 0 and cos 0 to make QAQ! triangular. What are the eigenvalues? 


When A is multiplied by a plane rotation Q;j, which entries of A are changed? 
When Q;,; A is multiplied on the right by On , which entries are changed now? 

How many multiplications and how many additions are used to compute Qi; A? 
Careful organization of the whole sequence of rotations gives 2n3 multiplications 
and 2n3 additions—the same as for QR by reflectors and twice as many as for LU. 


Challenge Problems 


(Turning a robot hand) The robot produces any 3 by 3 rotation A from plane rota- 
tions around the x, y, z axes. Then Q32Q31Q2i1A = R, where A is orthogonal so R 
is I! The three robot turns are in A = Q3} Q3} Q3p.. The three angles are “Euler 
angles” and det Q = 1 to avoid reflection. Start by choosing cos 0 and sin 6 so that 


cos —sin@ 0 -1 2 2 
Q214 = |sin0 cos@ O}] =| 2 —1 2] is zero in the (2, 1) position. 
0 0 1 2 2-1 


Create the 10 by 10 second difference matrix K = toeplitz((2 — 1 zeros(1,8)| 
Permute rows and columns randomly by K K = K(randperm(10), randperm( 
Factor by [L, U] = lu(K) and [LL, UU] = lu(K K), and count nonzeros by nnz(L) 
and nnz( LL). In this case L is in perfect tridiagonal order, but not LL. 


yi 
1 


Another ordering for this matrix K colors the meshpoints alternately red and black. 
This permutation P changes the normal 1,...,10 to 1,3,5, 7,9, 2,4, 6, 8, 10: 


21 D 


: : T_ 
Red-black ordering PKP- = | DI ar 


. Find the matrix D. 

So many interesting experiments are possible. If you send good ideas they can 
go on the linear algebra website math.mit.edu/linearalgebra. I also recommend 
learning the command B = sparse(A), after which find(B) will list the nonzero 
entries and Zu( B) will factor B using that sparse format for L and U. Only the 
nonzeros are computed, where ordinary (dense) MATLAB computes all the zeros too. 


Jeff Stuart has created a student activity that brilliantly demonstrates ill-conditioning: 


1 1.0001) |x| |3.0001+e | With errors x= 2 -— 10000(e — E) 
1 1.0000) |y| |3.0000+F| eand E y =1+4+10000(e — E) 


When those equations are shown by nearly parallel long sticks, a small shake gives 
a big jump in the crossing point (x, y). Errors e and E are amplified by 10000. 
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11.2 Norms and Condition Numbers 


How do we measure the size of a matrix? For a vector, the length is ||x||. For a matrix, 
the norm is || A||. This word “norm” is sometimes used for vectors, instead of length. It 
is always used for matrices, and there are many ways to measure ||A||._ We look at the 
requirements on all “matrix norms” and then choose one. 

Frobenius squared all the |a;;|2 and added; his norm ||A||p is the square root. This 
treats A like a long vector with n? components: sometimes useful, but not the choice here. 

I prefer to start with a vector norm. The triangle inequality says that ||a + y|| is not 
greater than ||æ|| + ||y||. The length of 2a or —2z is doubled to 2||æ||. The same rules 
will apply to matrix norms: 


|A+ Bll < lAl + Bl and ||cAll = |c] |All (1) 


The second requirements for a matrix norm are new, because matrices multiply. The 
norm || A|| controls the growth from x to Aw, and from B to AB: 


Growth factor ||A||_— [Aa] < |A|] æ and [ABI] <IANBI ®© 


This leads to a natural way to define || A||, the norm of a matrix: 


A 
The norm of A is the largest ratio |Axl|/||ax||: Al] = max || Az (3) 


#0 |x| 


|| Ax||/||a|| is never larger than || A|| (its maximum). This says that || Aa|| < || A|| æl. 


Example 1 If A is the identity matrix J, the ratios are ||z||/||a||. Therefore ||J|| = 1. If 
A is an orthogonal matrix Q, lengths are again preserved: ||Qa|| = ||æ||. The ratios still 
give ||Q|| = 1. An orthogonal Q is good to compute with: errors don’t grow. 


Example 2 The norm of a diagonal matrix is its largest entry (using absolute values): 


0 


A= F J has norm ||A|| = 3. The eigenvector æ = i 


0 3 | has Ax = 32. 


The eigenvalue is 3. For this A (but not all A), the largest eigenvalue equals the norm. 


For a positive definite symmetric matrix the norm is || A|| = Amax(A). 


Choose g to be the eigenvector with maximum eigenvalue. Then || Aa|| /||a|| equals Amax- 
The point is that no other x can make the ratio larger. The matrix is A = QAQ", and the 
orthogonal matrices Q and QT leave lengths unchanged. So the ratio to maximize is really 
|| Aa||/||x||. The norm is the largest eigenvalue in the diagonal A. 
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Symmetric matrices Suppose A is symmetric but not positive definite. A = QAQ™ is 
still true. Then the norm is the largest of |A1|, |A2|, ..., [Anl]. We take absolute values, 
because the norm is only concerned with length. For an eigenvector || Ax|| = ||Az|| = |A| 
times ||a||. The æ that gives the maximum ratio is the eigenvector for the maximum ||. 


Unsymmetric matrices If A is not symmetric, its eigenvalues may not measure its true 
size. The norm can be larger than any eigenvalue. A very unsymmetric example has 
A, = à = 0 but its norm is not zero: 


0 2 
0 0 


|All] > Amax A= | has norm || Al] = ma | Aae|| 


rz0 æl 


The vector x = (0,1) gives Ax = (2,0). The ratio of lengths is 2/1. This is the maximum 
ratio ||A||, even though x is not an eigenvector. 


It is the symmetric matrix A‘ A, not the unsymmetric A, that has eigenvector 
x = (0,1). The norm is really decided by the largest eigenvalue of AT A: 


The norm of A (symmetric or not) is the square root of Xmax(A™ A): 


ia ee Pe xTATAT® 


hae a E \max(AA) & 4 
TAOTI T 240 lala l ) S 


The unsymmetric example with \max(A) = 0 has Amax(A?A) = 4: 


0 0 


TOT Ta 
a=] | tends to 4 A=[) 4 


0 A with Amax = 4. So the normis || A|| = v4. 


For any A Choose æ to be the eigenvector of AT A with largest eigenvalue Amax. The 
ratio in equation (4) is x? AT Ax = a (Ana) divided by x! x. This is \max- 

No gz can give a larger ratio. The symmetric matrix ATA has eigenvalues \1,...,An 
and orthonormal eigenvectors q,,qo,...,@,- Every x is a combination of those vectors. 
Try this combination in the ratio and remember that qi qj; = 0: 


et At Aw _ (cry +++ enga)” (cong +--+ enAn da) _ Cin +++ + Cnn 
alg (cıq; + “+++ Canga) T (cq + -+ Cng,) c? + -+Ê £ 


The maximum ratio Amax 1s when all c’s are zero, except the one that multiplies Amax- 


Note 1 The ratio in equation (4) is the Rayleigh quotient for the symmetric matrix AT A. 
Its maximum is the largest eigenvalue Amax( ATA). The minimum ratio is \min(AT A). 
If you substitute any vector æ into the Rayleigh quotient xT AT Ax/a? x, you are guaran- 
teed to get a number between Amin (ATA) and Amax( ATA). 
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Note 2 The norm ||A|| equals the largest singular value Omax of A. The singular values 
O1,. . -Or are the square roots of the positive eigenvalues of ATA. So certainly 
Omak = ice A Since U and V are orthogonal in A = UDV1, the norm is ||A|| = 


Omax a 


The Condition Number of A 


Section 9.1 showed that roundoff error can be serious. Some systems are sensitive, others 
are not so sensitive. The sensitivity to error is measured by the condition number. This 
is the first chapter in the book which intentionally introduces errors. We want to estimate 
how much they change z. 

The original equation is Ax = b. Suppose the right side is changed to b + Ab 
because of roundoff or measurement error. The solution is then changed to x + Ag. Our 
goal is to estimate the change Az in the solution from the change Ab in the equation. 
Subtraction gives the error equation A(Aa) = Ab: 


Subtract Ax = b from A(x + Az)=b+ Ab tofind A(Azv) = Ab. (5) 


The error is Ax = A~!Ab. It is large when AT! is large (then A is nearly singular). The 
error Az is especially large when Ab points in the worst direction—which is amplified 
most by A~!. The worst error has || Ax|| = || A7} || ||Abjj. 

This error bound ||A~!|| has one serious drawback. If we multiply A by 1000, then 
A`! is divided by 1000. The matrix looks a thousand times better. But a simple rescaling 
cannot change the reality of the problem. It is true that Az will be divided by 1000, but so 
will the exact solution æ = A~!b. The relative error ||Ax||/||æ|| will stay the same. It is 
this relative change in x that should be compared to the relative change in b. 

Comparing relative errors will now lead to the “condition number” c = || A|| || ATE]. 
Multiplying A by 1000 does not change this number, because A~! is divided by 1000 and 
the condition number c stays the same. It measures the sensitivity of Ax = b. 


The solution error is less than c = || A|| || A~'|| times the problem error: 


Condition number c Ax << Ab] . (6) 
|x| llb 
If the problem error is AA (error in A instead of b), still c controls Az: 
Error AA in A Az < AAN (7) 


Jæ + Aal| ~~ JJA] 


11.2. Norms and Condition Numbers 521 


Proof The original equation is b = Ag. The error equation (5) is Ax = A~*Ab. 
Apply the key property || Aa|| < || A||||a|| of matrix norms: 


lèl < Allie] = and Az] < | A~*]] Abl] 


Multiply the left sides to get ||b|| || Aa||, and multiply the right sides to get c||a|| || Abl]. 
Divide both sides by ||b|| ||x||. The left side is now the relative error ||Aa||/||x||. The 
right side is now the upper bound in equation (6). 

The same condition number c = || A|| ||A~+|| appears when the error is in the matrix. 
We have AA instead of Ab in the error equation: 


Subtract Ax = b from (A + AA)(x + Az) = b to find A( Aw) = — (AA) (z + Az). 
Multiply the last equation by A~! and take norms to reach equation (7): 


z Az| 1, |AAll 
Aa|| < || ATH || AA! lla + Agl] or eo ae All || Aq? [teem 
|e] <A IAAI e+ el or PES < al aoe 


Conclusion Errors enter in two ways. They begin with an error AA or Ab—a wrong 
matrix or a wrong b. This problem error is amplified (a lot or a little) into the solution error 
Az. That error is bounded, relative to x itself, by the condition number c. 

The error Ab depends on computer roundoff and on the original measurements of b. 
The error AA also depends on the elimination steps. Small pivots tend to produce large 
errors in L and U. Then L+AL times U + AU equals A+ AA. When AA or the condition 
number is very large, the error Ax can be unacceptable. 


Example 3. When A is symmetric, c = ||A|j || A~1|| comes from the eigenvalues: 


A= eon) has norm 6 AT! = 5 i has norm 4 
0 2 l 0 ¿4 2 


This A is symmetric positive definite. Its norm is \max = 6. The norm of A`! is 
1/Amin = 4. Multiplying norms gives the condition number || A|| ArT lee Anax/ Amin: 


Amax 6 
Condition number for positive definite A c= — es a 


Example4 Keep the same A, with eigenvalues 6 and 2. To make æ small, choose b along 
the first eigenvector (1,0). To make Aa large, choose Ab along the second eigenvector 
(0,1). Then x = ¿b and Aw = 5Ab. The ratio || Ax||/||a/| is exactly c = 3 times the 
ratio ||Ab]|/l|bll. 

This shows that the worst error allowed by the condition number || Ajj || A~*|| can 
actually happen. Here is a useful rule of thumb, experimentally verified for Gaussian 
elimination: The computer can lose log c decimal places to roundoff error. 
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Problem Set 11.2 


1 Find the norms || A|| = Amax and condition numbers ¢ = Amax/Amin Of these posi- 


tive definite matrices: 
5 0 2 f 3l 
0 2 b 2 a 0 


2 Find the norms and condition numbers from the square roots of Amax( AT A) and 
\Amin( ATA). Without positive definiteness in A, we goto ATA! 


-2 0 l ł 1 1 
0 2 0 0 —1 1j 
3 Explain these two inequalities from the definitions (3) of || A|| and || B||: 


| ABal| < |All Bæ] < AU BI le. 


From the ratio of || ABza|| to ||x||, deduce that || A B|| < || A] || B||. This is the key to 
using matrix norms. The norm of A” is never larger than || A||”. 


4 Use ||AA~*|| < || Al] || A7+|| to prove that the condition number is at least 1. 


5 Why is J the only symmetric positive definite matrix that has Amax = Amin = 1? 
Then the only other matrices with ||A|| = 1 and ||A~!|| = 1 must have ATA = I. 
Those are matrices: perfectly conditioned. 


6 Orthogonal matrices have norm ||Q|| = 1. If A = QR show thet || A|| < ||/R|| and 
also ||R\| < ||Al]. Then |All = ||Q]| ||R||. Find an example of A = LU with 
All < IZU]. 


7 (a) Which famous inequality gives || (A + B)a|| < || Aa|| + || Ba|| for every x? 
(b) Why does the definition (3) of matrix norms lead to |A + B|| < || Ajj + |B|]? 


8 Show that if A is any eigenvalue of A, then |A| < || A||. Start from Aæ = àz. 


9 The “spectral radius” p(A) = |Amax| is the largest absolute value of the eigenvalues. 
Show with 2 by 2 examples that o( A + B) < p(A)+ p(B) and p(AB) < p(A)p(B) 
can both be false. The spectral radius is not acceptable as a norm. 


10 (a) Explain why A and AT! have the same condition number. 
(b) Explain why A and AT have the same norm, based on A( AT A) and A( AAT). 


11 Estimate the condition number of the ill-conditioned matrix A = | rae 


12 Why is the determinant of A no good as a norm? Why is it no good as a condition 
number? 
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13 (Suggested by C. Moler and C. Van Loan.) Compute b — Ay and b — Az when 
p= AN Ta .780 .563 = | -341 "E .999 
E ~ [01s .659| e a | 
Is y closer than z to solving Ax = b? Answer in two ways: Compare the residual 


b — Ay to b — Az. Then compare y and z to the true x = (1,—1). Both answers 
can be right. Sometimes we want a small residual, sometimes a small Az. 


14 (a) Compute the determinant of A in Problem 13. Compute A~}. 
(b) If possible compute || A|| and || A~+|| and show that c > 10°. 


Problems 15-19 are about vector norms other than the usual ||x|| = vz - z. 
15 The “' norm” and the “° norm” of x = (1,...,%n) are 
lel = [z1] +--+ fen] and | lello = max |x| 


Compute the norms ||a|| and ||a||; and |jæl|o of these two vectors in R*: 
ERRE et. 
16 Prove that ||æ||o < ||æl| < ||a||,. Show from the Schwarz inequality that the ratios 


læll/llæll and ||æ||ı1/||x|| are never larger than y/n. Which vector (21,. . ., £n) 
gives ratios equal to \/n? 


17 All vector norms must satisfy the triangle inequality. Prove that 
læ + yllæ < llæl + llylo and la + ylfa < [lara + lulla. 


18 Vector norms must also satisfy ||ca|| = |c| ||a||. The norm must be positive except 
when x = 0. Which of these are norms for vectors (£1, £2) in R?? 


lælla = |21| + 2|xo| |||] a = min (|21|, |z2l) 
Welle = |jæll + [laelles læl|p = ||Aa|| (this answer depends on A). 
Challenge Problems 


19 Show that «ty < ||ællı |/y||.o by choosing components y; = +1 to make xT y as 
large as possible. 


20 The eigenvalues of the —1, 2, —1 difference matrix K are À = 2 — 2 cos (jn /n+1). 
Estimate Amin and Amax and c = cond (K) = Amax/Amin as n increases: c ~ Cn? 
with what constant C? 


Test this estimate with eig( K) and cond(K) for n = 10, 100, 1000. 
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11.3 Iterative Methods and Preconditioners 


Up to now, our approach to Ax = b has been direct. We accepted A as it came. We 
attacked it by elimination with row exchanges. We now look at iterative methods, which 
replace A by a simpler matrix S. The difference T = S — A is moved over to the right 
side of the equation. The problem becomes easier to solve, with S instead of A. But there 
is a price—the simpler system has to be solved over and over. 

An iterative method is easy to invent. Just split A (carefully) into S — T. 


Rewrite Ax = b Sx = Tg +b. (1) 


The novelty is to solve (1) iteratively. Each guess x; leads to the next £k+41: 
Pure iteration SEk+1 = Ta, +b. (2) 


Start with any xo. Then solve Sx; = Txo + b. Continue to Sx = Txı + b. A hundred 
iterations are very common—often more. Stop when (and if!) x,+1 is sufficiently close 
to x,—or when the residual ry = b — Az; is near zero. Our hope is to get near the true 
solution, more quickly than by elimination. When the x; converge, their limit 7, does 
solve equation (1): Sa, = T £əæ +b means Ar = b. 

The two goals of the splitting A = S — T are speed per step and fast convergence. 
The speed of each step depends on S and the speed of convergence depends on STIT: 


1 Equation (2) should be easy to solve for x, . The “preconditioner” S could be the 
diagonal or triangular part of A. A fast way uses © = LoUo, where those factors 
have many zeros compared to the exact A = LU. This is “incomplete LU”. 


2 The difference x — x, (this is the error ex) should go quickly to zero. Subtracting 
equation (2) from (1) cancels b, and it leaves the equation for the error ex: 


Error equation Sek+ı = Tep which means e,41 = S “Tey, (3) 


At every step the error is multiplied by S~'7. If S~1T is small, its powers go quickly to 
zero. But what is “small”? 

The extreme splitting is S = A and T = 0. Then the first step of the iteration is the 
original Ax = b. Convergence is perfect and STIT is zero. But the cost of that step is 
what we wanted to avoid. The choice of S is a battle between speed per step (a simple S’) 
and fast convergence (S close to A). Here are some choices of S: 


J S = diagonal part of A (the iteration is called Jacobi’s method) 
GS S = lower triangular part of A including the diagonal (Gauss-Seidel method) 


ILU S = approximate L times approximate U (incomplete LU method). 
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Our first question is pure linear algebra: When do the x,’s converge to x? The 
answer uncovers the number |A|max that controls convergence. In examples of Jacobi and 
Gauss-Seidel, we will compute this “spectral radius” |X| max. It is the largest eigenvalue of 
the iteration matrix B = STIT. 


The Spectral Radius p(B) Controls Convergence 


Equation (3) is ex41 = S~1Te,. Every iteration step multiplies the error by the same 
matrix B = S~!T. The error after k steps is ey = B*eo. The error approaches zero if 
the powers of B = ST 1T approach zero. It is beautiful to see how the eigenvalues of 
B—the largest eigenvalue in particular—control the matrix powers B*. 


The powers B! approach zero if and only if every eigenvalue of B has |\| < 1. 


The rate of convergence is controlled by the spectral radius of B: p = max |X(B)|. 


The test for convergence is |A|max < 1. Real eigenvalues must lie between —1 and 1. 
Complex eigenvalues À = a + ib must have |A|? = a? +b? < 1. The spectral radius “rho” 
is the largest distance from 0 to the eigenvalues of B = S~!T. This is p = |A|max- 

To see why |\|max < 1 is necessary, suppose the starting error eg happens to be an 
eigenvector of B. After one step the error is Bey = Ae. After k steps the error is Beo = 
“eo. If we start with an eigenvector, we continue with that eigenvector—and the factor 
\ only goes to zero when |X| < 1. This condition is required of every eigenvalue. 

To see why |A|max < 1 is sufficient for the error to approach zero, suppose & is a 
combination of eigenvectors: 


eo =c1%i+-:--+cn2n leadsto ep = c1(A1) "ay . CaO) Loe (4) 
This is the point of eigenvectors! When we multiply by B, each eigenvector x; is multiplied 
by A. If all |A;| < 1 then equation (4) ensures that e, goes to zero. 
6 5 Go Lal 
6 5 0 8 


B? is 1.1 times B. Then B? is (1.1)? times B. The powers of B will blow up. 
Contrast with the powers of B’. The matrix (B’)* has (.6)* and (.5)* on its diagonal. 
The off-diagonal entries also involve p* = (.6)*, which sets the speed of convergence. 


Example1 B= | | has 4 — ll B' = | has Amax = .6 


Note When there are too few eigenvectors, equation (4) is not correct. We turn to the 
Jordan form when eigenvectors are missing and the matrix B can’t be diagonalized: 


Jordan form J B=MJIM* and B*=MsJ*M, (5) 
Section 8.3 shows how J and J! are made of “blocks” with one repeated eigenvalue: 


A 17% Tak kak- 
0 dX 0 Me 


The powers of a 2 by 2 block in J are | 
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If [A| < 1 then these powers approach zero. The extra factor k from a double eigenvalue is 
overwhelmed by the decreasing factor \*—!. This applies to every block: 


Diagonalizable or not: Convergence B® — O and its speed dependon p = |X| max < 1. 


Jacobi versus Gauss-Seidel 
We now solve a specific 2 by 2 problem by splitting A. Watch for that number |A| max. 


2u— v= 4 . u 2 
Az =b Eor a: has the solution H = H (6) 
The first splitting is Jacobi’s method. Keep the diagonal of A on the left side (this is S). 
Move the off-diagonal part of A to the right side (this is T). Then iterate: 


. o . Zs 2uk+1 = UR + 4 
Jacobi iteration Ste41 =Ta, +b ioi S 


Start from ug = vo = 0. The first step finds u; = 2 and vı = —1. Keep going: 


ol La) Po Leva Po fias ees [a] 


This shows convergence. At steps 1, 3, 5 the second component is —1, —1/4, —1/16. 
Those drop by 4 in each two steps. The error equation is Sek+41 = Tex: 


2 0 0 1 ñ 2 
Error equation li J ex =|} | ek or e= | | ek. (7) 


1 
2 


That last matrix S7 1T has eigenvalues ł and —4, So its spectral radius is p(B) = 


: gl A) Cio 
B= SİT = $ J has Alma and |: J = f i i 
Two steps multiply the error by “ exactly, in this special example. The important message 
is this: Jacobi’s method works well when the main diagonal of A is large compared to the 
off-diagonal part. The diagonal part is S, the rest is —T. We want the diagonal to dominate. 
The eigenvalue A = 5 is unusually small. Ten iterations reduce the error by 
210 = 1024. More typical and more expensive is |A|max = -99 or .999. 
The Gauss-Seidel method keeps the whole lower triangular part of A as S: 
2Uk+1 =vk +4 Uk+1 = 5 Uk +2 


Gauss-Seidel or i 
—Uk+1 + 2Uk+1 5 =e Uk+1 = ZUk+1 7 1. 


(8) 


Notice the change. The new ux+ı from the first equation is used immediately in the second 
equation. With Jacobi, we saved the old ux until the whole step was complete. With Gauss- 
Seidel, the new values enter right away and the old u% is destroyed. This cuts the storage in 
half. It also speeds up the iteration (usually). And it costs no more than the Jacobi method. 
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Test the iteration starting from another start uo = 0 and vw = —1: 


0 3/2 15/8 63/32 EEE A 2 
a aa Ee eoa ER 0]: 
The errors in the first component are 2, 1/2, 1/8, 1/32. The errors in the second component 


are —1, —1/4, —1/16, —1/32. We divide by 4 in one step not two steps. Gauss-Seidel is 
twice as fast as Jacobi. We have pas = (p3)? when A is positive definite tridiagonal: 


oe © 
ele ple 
SSS | 


The Gauss-Seidel eigenvalues are 0 and Ł, Compare with 5 and —4 for Jacobi. 


With a small push we can describe the successive overrelaxation method (SOR). 
The new idea is to introduce a parameter w (omega) into the iteration. Then choose this 
number w to make the spectral radius of S~!T as small as possible. 

Rewrite Ax = b as wAx = wb. The matrix S in SOR has the diagonal of the origi- 
nal A, but below the diagonal we use wA. On the right side T is S — wA: 


SOR 2Uuk+1 = (2 — 2w)uk + wWUk + 4w 
—wuk+1 + 2Uk+1 = (2 — 2w Wg — 2w. 


(9) 


This looks more complicated to us, but the computer goes as fast as ever. SOR is like 
Gauss-Seidel, with an adjustable number w. The best w makes it faster. 
I will put on record the most valuable test matrix of order n. Itis our favorite —1, 2, 
—1 tridiagonal matrix K. The diagonalis 2/. Below and above are —1’s. Our example had 
T 1 


n = 2, which leads to cos = = = as the Jacobi eigenvalue found above. Notice especially 


that this |A|max is squared for Gauss-Seidel: 


The splittings of the —1, 2, —1 matrix K of order n yield these eigenvalues of B: 


Jacobi (S = 0, 2, 0 matrix): ST has | Nhe = COS 
n+1 
2 
Gauss-Seidel (S = —1, 2, 0 matrix): SHT has Alma = ( cos x -) 
; 2 2 
SOR (with the bestw): S~- T has |A|max = ( cos ——} /( + sin ——) 


Let me be clear: For the —1, 2, —1 matrix you should not use any of these iterations! 
Elimination on a tridiagonal matrix is very fast (exact LU). Iterations are intended for a 
large sparse matrix that has nonzeros far from the central diagonal. Those create many 
more nonzeros in the exact L and U. This fill-in is why elimination becomes expensive. 
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We mention one more splitting. The idea of “incomplete L U” is to set the small nonze- 
ros in L and U back to zero. This leaves triangular matrices Lo and Up which are again 
sparse. The splitting has S = LoUo on the left side. Each step is quick: 


Incomplete LU LoUo£k+1 = (LoUo — A)£k + b. 


On the right side we do sparse matrix-vector multiplications. Don’t multiply Lo times Uo, 
those are matrices. Multiply a; by Uo and then multiply that vector by Lo. On the left side 
we do forward and back substitutions. If LoUo is close to A, then |\|max is small. A few 
iterations will give a close answer. 


Multigrid and Conjugate Gradients 


I cannot leave the impression that Jacobi and Gauss-Seidel are great methods. Generally the 
“low-frequency” part of the error decays very slowly, and many iterations are needed. Here 
are two important ideas that bring tremendous improvement. Multigrid can solve problems 
of size n in O(n) steps. With a good preconditioner, conjugate gradients becomes one of 
the most popular and powerful algorithms in numerical linear algebra. 


Multigrid Solve smaller problems with coarser grids. Each iteration will be cheaper and 
faster. Then interpolate between the coarse grid values to get a quick headstart on the 
full-size problem. Multigrid might go 4 levels down and back. 


Conjugate gradients An ordinary iteration like x4; = £k — Ax, + b involves mul- 
tiplication by A at each step. If A is sparse, this is not too expensive: Ax, is what we 
are willing to do. It adds one more basis vector to the growing “Krylov spaces” that con- 
tain our approximations. But 2,4, is not the best combination of £o, Axo,..., AF ao. 
The ordinary iterations are simple but far from optimal. 

The conjugate gradient method chooses the best combination z;, at every step. The 
extra cost (beyond one multiplication by A) is not great. We will give the CG iteration, 
emphasizing that this method was created for a symmetric positive definite matrix. When 
A is not symmetric, one good choice is GMRES. When A = A? is not positive definite, 
there is MINRES. A world of high-powered iterative methods has been created around the 
idea of making optimal choices of each successive xx. 


My textbook Computational Science and Engineering describes multigrid and CG in 
much more detail. Among books on numerical linear algebra, Trefethen-Bau is deservedly 
popular (others are terrific too). Golub- Van Loan is a level up. 

The Problem Set reproduces the five steps in each conjugate gradient cycle from x,_1 
to £k. We compute that new approximation z,;, the new residual ry = b — Axx, and the 
new search direction d; to look for the next £k+1. 

I wrote those steps for the original matrix A. But a preconditioner S can make con- 
vergence much faster. Our original equation is Ax = b. The preconditioned equation is 
S~1Agxw = S~'b. Small changes in the code give the preconditioned conjugate gradient 
method—the leading iterative method to solve positive definite systems. 
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The biggest competition is direct elimination, with the equations reordered to take max- 
imum advantage of the zeros in A. It is not easy to outperform Gauss. 


Iterative Methods for Eigenvalues 


We move from Az = bto Ax = Az. Iterations are an option for linear equations. They 
are a necessity for eigenvalue problems. The eigenvalues of an n by n matrix are the roots 
of an nth degree polynomial. The determinant of A — AJ starts with (—A)”. This book 
must not leave the impression that eigenvalues should be computed that way! Working 
from det(A — AJ) = 0 is a very poor approach—except when n is small. 

For n > 4 there is no formula to solve det(A — AJ) = 0. Worse than that, the ’s 
can be very unstable and sensitive. It is much better to work with A itself, gradually mak- 
ing it diagonal or triangular. (Then the eigenvalues appear on the diagonal.) Good computer 
codes are available in the LAPACK library—individual routines are free on 
www.netlib.org/lapack. This library combines the earlier LINPACK and EISPACK, with 
many improvements (to use matrix-matrix operations in the Level 3 BLAS). It is a collec- 
tion of Fortran 77 programs for linear algebra on high-performance computers. For your 
computer and mine, a high quality matrix package is all we need. For supercomputers with 
parallel processing, move to ScaLAPACK and block elimination. 


We will briefly discuss the power method and the QR method (chosen by LAPACK) 
for computing eigenvalues. It makes no sense to give full details of the codes. 


1 Power methods and inverse power methods. Start with any vector ug. Multiply by 
A to find uı. Multiply by A again to find u2. If wo is a combination of the eigenvectors, 
then A multiplies each eigenvector x; by ;. After k steps we have ();)*: 


up = A® up = e1(A1) "ay +e En An) Bm (10) 
As the power method continues, the largest eigenvalue begins to dominate. The vectors 
Ux point toward that dominant eigenvector xı. We saw this for Markov matrices: 


9 3 TEE O 
A=(4 4 has Amax = 1 with eigenvector Eat 


Start with wo and multiply at every step by A: 


ee! ike o ee ee hi o |.f9 
uo = |g] %15 |4| 82= f| įg] iS approaching too = | o- | - 


The speed of convergence depends on the ratio of the second largest eigenvalue Az to the 
largest A1. We don’t want A; to be small, we want A2/A1 to be small. Here Ap = .6 and 
Ai = 1, giving good speed. For large matrices it often happens that |A2/A1| is very close 
to 1. Then the power method is too slow. 

Is there a way to find the smallest eigenvalue—which is often the most important in 
applications? Yes, by the inverse power method: Multiply uo by AT! instead of A. Since 
we never want to compute A~*, we actually solve Au; = uo. By saving the LU factors, 
the next step Auz = u; is fast. Step k has Aux = ux_-1: 
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C1 Ly parek CnEn 
(A1)* (An)* 
k 


Now the smallest eigenvalue Amin is in control. When it is very small, the factor 1/A‘.,., 
is large. For high speed, we make Amin even smaller by shifting the matrix to A — A*I. 

That shift doesn’t change the eigenvectors. (A* might come from the diagonal of A, 
even better is a Rayleigh quotient xT Ax/zx'z). If A* is close to Amin then (A — A*J)7} 
has the very large eigenvalue (Amin — A*)~+. Each shifted inverse power step multiplies 
the eigenvector by this big number, and that eigenvector quickly dominates. 


(11) 


Inverse power method Uk = Aug = 


2 The QR Method This is a major achievement in numerical linear algebra. Sixty years 
ago, eigenvalue computations were slow and inaccurate. We didn’t even realize that solving 
det(A — AI) = 0 was a terrible method. Jacobi had suggested earlier that A should 
gradually be made triangular—then the eigenvalues appear automatically on the diagonal. 
He used 2 by 2 rotations to produce off-diagonal zeros. (Unfortunately the previous zeros 
can become nonzero again. But Jacobi’s method made a partial comeback with parallel 
computers.) The QR method is now a leader in eigenvalue computations. 

The basic step is to factor A, whose eigenvalues we want, into QR. Remember from 
Gram-Schmidt (Section 4.4) that Q has orthonormal columns and R is triangular. For 
eigenvalues the key idea is: Reverse Q and R. The new matrix (same \’s) is A; = RQ. 
The eigenvalues are not changed in RQ because A = QR is similar to Ay = Q71 AQ: 


Ai = RQ has the same A QRe = gives RQ(Q7'x)=X(Qu'a). (12) 


This process continues. Factor the new matrix A; into Q;R,. Then reverse the factors 
to RıQı. This is the similar matrix A, and again no change in the eigenvalues. Amazingly, 
those eigenvalues begin to show up on the diagonal. Soon the last entry of A4 holds an 
accurate eigenvalue. In that case we remove the last row and column and continue with a 
smaller matrix to find the next eigenvalue. 

Two extra ideas make this method a success. One is to shift the matrix by a multiple of 
I, before factoring into QR. Then RQ is shifted back to give Ax41: 


Factor Ay — c1 into Qa Rg. The next matrix is Ak+1 = RkQk + cel. 


Ax+1 has the same eigenvalues as Ax, and the same as the original Ag = A. A good shift 
chooses c near an (unknown) eigenvalue. That eigenvalue appears more accurately on the 
diagonal of Aķ+ı—which tells us a better c for the next step to Ak+2. 

The second idea is to obtain off-diagonal zeros before the Q R method starts. An elim- 
ination step FE will do it, or a Givens rotation, but don’t forget ÆT! (or A will change): 


1 1 2 3] f1 1 5 3 
EAE! = 1 1 4 5 1 = |1 9 5| .SameA’s. 
—1 1|l1 6 7 1 1 0 4 2 


We must leave those nonzeros 1 and 4 along one subdiagonal. More E’s could remove 
them, but E~! would fill them in again. This is a “Hessenberg matrix” (one nonzero 
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subdiagonal). The zeros in the lower left corner will stay zero through the QR method. 
The operation count for each QR factorization drops from O(n?) to O(n?). 

Golub and Van Loan give this example of one shifted QR step on a Hessenberg matrix. 
The shift is 77, taking 7 from all diagonal entries of A (then shifting back for A): 


1 2 3 —.54 1.69 0.835 
A= j|4 5 6 leadsto A, = lL 66.93 —6.656 
0 -001 7 0 .00002 7.012 


Factoring A — 7I into QR produced A; = RQ +71. Notice the very small number .00002. 
The diagonal entry 7.012 is almost an exact eigenvalue of A;, and therefore of A. Another 
QR step on A, with shift by 7.0127 would give terrific accuracy. 


For a few eigenvalues of a large sparse matrix I would look to ARPACK. 
Problems 25-27 describe the Arnoldi iteration that orthogonalizes the basis—each step 
has only three terms when A is symmetric. The matrix becomes tridiagonal: a wonderful 
start for computing eigenvalues. 


Problem Set 11.3 


Problems 1-12 are about iterative methods for Ax = b. 


1 Change Av = b to x = (I — A)x + b. What are S and T for this splitting? What 
matrix S~1T controls the convergence of x,41 = (I — A)£k + b? 


2 If À is an eigenvalue of A, then is an eigenvalue of B = J — A. The real 
eigenvalues of B have absolute value less than 1 if the real eigenvalues of A lie 
between _ and 


3 Show why the iteration £k+1 = (J — A)a, +b does not converge for A = [ = Z5 ] f 


4 Why is the norm of B* never larger than ||B||¥? Then ||B|| < 1 guarantees that the 
powers B* approach zero (convergence). No surprise since |\|max is below || B|]. 


5 If A is singular then all splittings A = S — T must fail. From Ax = O show that 
STs = x. So this matrix B = STIT has À = 1 and fails. 


6 Change the 2’s to 3’s and find the eigenvalues of S~!T for Jacobi’s method: 
a 0 OAR 
SZk+1 = TEk +b is l Thr = k 5 Tk +D. 


7 Find the eigenvalues of S7 1T for the Gauss-Seidel method applied to Problem 6: 


3 (COO 0 1 
e 4 Cki = fo J £k + Db. 


2 


fax for Jacobi? 


Does |A|max for Gauss-Seidel equal |A| 
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For any 2 by 2 matrix [28] show that |A\|max equals |bc/ad| for Gauss-Seidel and 
|bc/ad|*/? for Jacobi. We need ad Æ 0 for the matrix S' to be invertible. 


Write a computer code (MATLAB or other) for the Gauss-Seidel method. You can 
define S and T from A, or set up the iteration loop directly from the entries a;;. Test 
it on the —1, 2, —1 matrices A of order 10, 20, 50 with b = (1,0,...,0). 


The Gauss-Seidel iteration at component i uses earlier parts of 2°”: 


z : 
1 {A 

Gauss-Seidel 2° = zd + = Q — 5 ae E 3 Gis gle), 

a ` si be 

j=1 j=i 


If every 2 = golg how does this show that the solution æ is correct? How does 
the formula change for Jacobi’s method? For SOR insert w outside the parentheses. 


Divide equation (10) by A¥ and explain why |\2/A;| controls the convergence of the 
power method. Construct a matrix A for which this method does not converge. 


The Markov matrix A = [-?:3] has \ = 1 and .6, and the power method uz = 
A®uo converges to [-23]. Find the eigenvectors of A~’. What does the inverse 


power method u_, = A~* uo converge to (after you multiply by 6")? 


The tridiagonal matrix of size n — 1 with diagonals —1,2,—1 has eigenvalues 
A; = 2—2cos(jr/n). Why are the smallest eigenvalues approximately (j/n)*? 
The inverse power method converges at the speed \;/A2 ~ 1/4. 


For A = = E apply the power method uz; = Aux three times starting with 
Uo = [ t | . What eigenvector is the power method converging to? 

For A = —1, 2,—1 matrix, apply the inverse power method uz4, = Au, three 
times with the same wo. What eigenvector are the u,’s approaching? 


In the QR method for eigenvalues when A is shifted to make A23 = 0, show that the 
2,1 entry drops from sin 0 in A = QR to — sin? 0 in RQ. (Compute R and RQ.) 
This “cubic convergence” makes the method a success: 


cos@ sin cos? —sinĝ| |1 ? 
oe pear 0 | a bay a lo ar 
If A is an orthogonal matrix, its QR factorization has Q = and R = 


Therefore RQ = . These are among the rare examples when the Q.R method 
goes nowhere. 


The shifted QR method factors A — cl into QR. Show that the next matrix A, = 
RQ + cI equals Q~! AQ. Therefore A, has the eigenvalues as A (but A; is 
closer to triangular). 


3 
, 
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19 When A = AT, the “Lanczos method” finds a’s and b’s and orthonormal q’s so that 
Aq; = b5-144_1 + a;q; + bidji (with gq, = 0). Multiply by q; to find a formula 
for a;. The equation says that AQ = QT where T is a tridiagonal matrix. 


20 = The equation in Problem 19 develops from this loop with bọ = 1 and rp = any q;: 
Q541 = 15/03; j =j +l; a; =q] Ag; ry = AQ; —5)-19j_1 —459;; b = lirl: 
Write a code and test it on the —1, 2, —1 matrix A. QTQ should be T. 


21 Suppose A is tridiagonal and symmetric in the QR method. From A, = Q-1AQ 
show that A; is symmetric. Write A; = RAR to show that A, is also tridiagonal. 
(If the lower part of A, is proved tridiagonal then by symmetry the upper part is too.) 
Symmetric tridiagonal matrices are the best way to start in the QR method. 


Problems 22-25 present two fundamental iterations. Each step involves Aq or Ad. 


The key point for large matrices is that matrix-vector multiplication is much faster 
than matrix-matrix multiplication. A crucial construction starts with a vector b. Re- 
peated multiplication will produce Ab, A’b,... but those vectors are far from orthogonal. 
The “Arnoldi iteration” creates an orthonormal basis q,, qz,... for the same space by the 
Gram-Schmidt idea: orthogonalize each new Aq,, against the previous q,,...,Qn—1- The 
“Krylov space” spanned by b, Ab, ..., A” tb then has a much better basis q4,- , qn- 


Here in pseudocode are two of the most important algorithms in numerical linear 
algebra: Arnoldi gives a good basis and CG gives a good approximation to x = A~!b. 


Arnoldi Iteration Conjugate Gradient Iteration for Positive Definite A 
qı = b/||b|| £o = 0, ro = b, do = To 
forn=l1toN—1 |forn=1toN 
v = Aq, On = (rT _yPrn—1)/(dx_,Adn—1) step length ap_1 to £n 


for j = 1 ton Éa = Vai On drei approximate solution 
hn = qj v Tn =Tn-1 — OnAdn-1 new residual b — Az, 
v =v —hjng; 6, =r ra Tl Tai improvement this step 

hatla” Well dn = Tn + Bndn-1 next search direction 

dn41 = V/hn+1,n | % Notice: only 1 matrix-vector multiplication Aq and Ad 


For conjugate gradients, the residuals r,, are orthogonal and the search directions are A- 
orthogonal: all d; Ady = 0. The iteration solves Ax = b by minimizing the error eT Ae 
over all vectors in the Krylov space = span of b, Ab,..., A”~*b. It is a fantastic algorithm. 


22 For the diagonal matrix A = diag([1 2 3 4ļ) and the vector b = (1,1,1,1), 
go through one Arnoldi step to find the orthonormal vectors q; and q». 


534 


23 


24 


25 


26 


Chapter 11. Numerical Linear Algebra 


Arnoldi’s method is finding Q so that AQ = QH (column by column): 


Ait hig + hin 
h h © h 


0 0 +: hnn 


H is a “Hessenberg matrix” with one nonzero subdiagonal. Here is the crucial fact 
when A is symmetric: The Hessenberg matrix H = Q-'AQ = Q™AQ is 
symmetric and therefore it is tridiagonal. Explain that sentence. 


This tridiagonal H (when A is symmetric) gives the Lanczos iteration: 


Three terms only Gai = (Ag; — hjg; — Py—1,59-1)/Pj+1,3 


From H = Q71AQ, why are the eigenvalues of H the same as the eigenvalues 
of A? For large matrices, the “Lanczos method” computes the leading eigenvalues 
by stopping at a smaller tridiagonal matrix H;,. The QR method in the text is applied 
to compute the eigenvalues of Hx. 


Apply the conjugate gradient method to solve Ax = b = ones(100,1), where A is 
the —1, 2, —1 second difference matrix A = toeplitz((2 — 1 zeros(1,98)]). Graph 
X19 and 229 from CG, along with the exact solution x. (Its 100 components are 
xi = (ih — 17h?) /2 with h = 1/101. “plot(i, 2(i))” should produce a parabola.) 


For unsymmetric matrices, the spectral radius p = max|,,| is not a norm. 
But still ||A”|| grows or decays like p” for large n. Compare those numbers for 
A= {1 1; 0 1.1) using the command norm. 


A” — 0 if and only if p < 1. When A = S~'T,, this is the key to convergence. 


Chapter 12 


Linear Algebra in Probability & Statistics 


12.1 Mean, Variance, and Probability 


We are starting with the three fundamental words of this chapter: mean, variance, and 
probability. Let me give a rough explanation of their meaning before I write any formulas : 
The mean is the average value or expected value 
The variance o? measures the average squared distance from the mean m 


The probabilities of n different outcomes are positive numbers p),..., Dy, adding to 1. 


Certainly the mean is easy to understand. We will start there. But right away we have 
two different situations that you have to keep straight. On the one hand, we may have 
the results (sample values) from a completed trial. On the other hand, we may have the 
expected results (expected values) from future trials. Let me give examples: 


Sample values Five random freshmen have ages 18, 17, 18, 19, 17 

Sample mean £(18 +17 +18 +19 + 17) = 17.8 

Probabilities The ages in a freshmen class are 17 (20%), 18 (50%), 19 (30%) 

A random freshman has expected age E [x] = (0.2) 17 + (0.5) 18 + (0.3) 19 = 18.1 


Both numbers 17.8 and 18.1 are correct averages. The sample mean starts with V samples 
Ti, :.., ZN from a completed trial. Their mean is the average of the N observed samples: 


1 
Sample mean m = p = yl a) (1) 
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The expected value of x starts with the probabilities p,,...,p, of the ages 71,..., £n : 
Expected value m = E[a] = pi%1 + pote + +++ + PnTn. (2) 


This is p - x. Notice that m = Efx] tells us what to expect, m = p tells us what we got. 


By taking many samples (large N), the sample results will come close to the proba- 
bilities. The “Law of Large Numbers” says that with probability 1, the sample mean will 
converge to its expected value E[z] as the sample size N increases. A fair coin has prob- 
ability po = 4 of tails and pı = 4 of heads. Then E[z] = (5) 0 + $(1). The fraction of 
heads in JN flips of the coin is the sample mean, expected to approach E/z] = Z, 

This does not mean that if we have seen more tails than heads, the next sample is likely 
to be heads. The odds remain 50-50. The first 100 or 1000 flips do affect the sample mean. 
But 1000 flips will not affect its limit—because you are dividing by N — oo. 


Variance (around the mean) 


The variance a° measures expected distance (squared) from the expected mean Efz]. 


The sample variance S* measures actual distance (squared) from the sample mean. The 
square root is the standard deviation ø or S. After an exam, I email p and S to the class. 
I don’t know the expected mean and variance because I don’t know the probabilities pı to 
D100 for each score. (After teaching for 50 years, I still have no idea what to expect.) 

The deviation is always deviation from the mean—sample or expected. We are looking 
for the size of the “spread” around the mean value z = m. Start with N samples. 


1 
N-1 


Sample variance Sos 


[(@1- m? +--+- m)?] @) 


The sample ages x = 18, 17, 18, 19, 17 have mean m =17.8. Thatsample has variance 0.7 : 


I 


S? = — [(.2) + (—.8)? + (.2)? + (1.2)? + (—.8)°] = 7; (2:8) = 0.7 


Ae] ee 


The minus signs disappear when we compute squares. Please notice! Statisticians divide 
by N — 1 = 4 (and not N = 5) so that S? is an unbiased estimate of 77. One degree of 
freedom is already accounted for in the sample mean. 

An important identity comes from splitting each (x — m)? into x? — 2mz + m?: 


* = (sum of 2?) — 2m(sum of z:) + (sum of m) 


= (sum of z?) — 2m(Nm) + Nm? 
sum of (x; — m)? = (sum of x?) — Nm?. (4) 


sum of (z; — m) 


This is an equivalent way to find (xı — m)? +- -- + (£y — m°?) by adding zf + ---+ xå. 
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Now start with probabilities p; (never negative!) instead of samples. We find expected 
values instead of sample values. The variance g? is the crucial number in statistics. 


Variance o° = E [(a — m)?] = pı (xı — M)? +---+pn(an —m)?. (5) 


We are squaring the distance from the expected value m = Efz]. We don’t have samples, 
only expectations. We know probabilities but we don’t know experimental outcomes. 


Example 1 Find the variance ø? of the ages of college freshmen. 


Solution The probabilities of ages x; = 17,18,19 were p; = 0.2 and 0.5 and 0.3. 
The expected value was m = 5) pix; = 18.1. The variance uses those same probabilities : 


a? = (0.2)(17 — 18.1)? + (0.5)(18 — 18.1)? + (0.3)(19 — 18.1)? 
= (0.2)(1.21) + (0.5)(0.01) + (0.3)(0.81) = 0.49. 


The standard deviation is the square root o = 0.7. 
This measures the spread of 17, 18, 19 around E/z], weighted by probabilities .2, .5, .3. 


Continuous Probability Distributions 


Up to now we have allowed for n possible outcomes %1,...,2%,. With ages 17, 18, 19, 
we only had n = 3. If we measure age in days instead of years, there will be a thousand 
possible ages (too many). Better to allow every number between 17 and 20—a continuum 
of possible ages. Then the probabilities pı, p2, p3 for ages 1, £2, £3 have to move to a 
probability distribution p(x) for a whole continuous range of ages 17 < x < 20. 

The best way to explain probability distributions is to give you two examples. They 
will be the uniform distribution and the normal distribution. The first (uniform) is easy. 
The normal distribution is all-important. 


Uniform distribution Suppose ages are uniformly distributed between 17.0 and 20.0. 
All ages between those numbers are “equally likely”. Of course any one exact age has no 
chance at all. There is zero probability that you will hit the exact number x = 17.1 or 
xz = 17+ v2. What you can truthfully provide (assuming our uniform distribution) is 
the chance F(x) that a random freshman has age less than z : 


The chance of age less than x = 17 is F(17) = 0 x < 17 won’t happen 
The chance of age less than z = 20 is F(20) = 1 x < 20 will happen 
The chance of age less than z is F(x) = 3(a—17) F goes from 0 to 1 


That formula F(x) = (x — 17) gives F = 0 at x = 17; then x < 17 won’t happen. It 


gives F(x) = 1 at x = 20; then x < 20 is sure. Between 17 and 20, the graph of the 
cumulative distribution F(x) increases linearly for this uniform model. 
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Let me draw the graphs of F(x) and its derivative p(x) = “probability density function”. 


cumulative F(x) = “pdf” p(x) = 
probability that a probability that a 
sample is below x sample is near x 


F(x) = 4 (x — 17) _ dF 


Ig 20 17 20 


Figure 12.1: F(x) is the cumulative distribution and its derivative p(x) = dF/dz is the 
probability density function (pdf). For this uniform distribution, p(x) is constant between 
17 and 20. The total area under the graph of p(x) is the total probability F = 1. 


You could say that p(x) dx is the probability of a sample falling in between x and 
x + dz. This is “infinitesimally true”: p(x) dx is F(x + dz) — F(z). Here is the full truth: 


b 
F = integralofp Probability ofa < x < b = fpo) dx = F(b)— F(a) (6) 


F(b) is the probability of x < b. I subtract F(a) to keep z > a. That leaves a < x < b. 


Mean and Variance of p(x) 


What are the mean m and variance g? for a probability distribution ? Previously we added 
pix; to get the mean (expected value). With a continuous distribution we integrate xp(x) : 


Mean 


For this uniform distribution, the mean m is halfway between 17 and 20. Then the 
probability of a random value z below this halfway point m = 18.5 is F(m) = 5. 

In MATLAB, xz = rand(1) chooses a random number uniformly between 0 and 1. 
Then the expected mean is m = 5: The interval from 0 to x has probability F(x) = zx. 
The interval below the mean m always has probability F'(m) = 4. 

The variance is the average squared distance to the mean. With N outcomes, g? is the 
sum of p;(x; — m)?. For a continuous random variable x, the sum changes to an integral. 


Variance o? = BE |(2 —m)?] = fræ (a — m)? dz (7) 
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When ages are uniform between 17 < x < 20, the integral can shift to 0 < x < 3: 


T= 


20 3 
Í 1 1 2 3 
2= | -(x—18. tan = fs — 1.5)? dx = =(x — 1.5)? = (1.5) =- 
: [3 5)de= | Z(2 — 1.5)? de = 2(2 — 15) =(1.8)° = $ 
I7 0 xz=0 


That is a typical example, and here is the complete picture for a uniform p(x), 0 to a. 


1 x 
Uniform distribution forO < x < a Density p(x) = — Cumulative F(z) = — 
a a 


2 


a 1 2 
Mean m = — halfway Variance o? = J — (2 — =} de = =— (8) 
2 a 2 12 
0 


The mean is a multiple of a, the variance is a multiple of a”. For a = 3, o? = 3 = 3, 


For one random number between 0 and 1 (mean 5) the variance is o? = 5. 


Normal Distribution : Bell-shaped Curve 


The normal distribution is also called the “Gaussian” distribution. It is the most important 
of all probability density functions p(x). The reason for its overwhelming importance 
comes from repeating an experiment and averaging the outcomes. The experiments have 
their own distribution (like heads and tails). The average approaches a normal distribution. 


Central Limit Theorem (informal) The average of N samples of “any” probability 
distribution approaches a normal distribution as N — oo. 


Start with the “standard normal distribution”. It is symmetric around x = 0, so its mean 
value is m = 0. It is chosen to have a standard variance o? = 1. It is called N (0,1). 


(9) 


ege T i is —x?/2 
Total probability = 1 prjd = ae ee dg=1 
T 
1 T aF 2/2 
Mean E [x] = 0 m= —— } ze * /*dr=0 
27 
1 2 
Variance E [x77] = 1 = | x —0)2e-* dr =1 
[z] z | e- 
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The zero mean was easy because we are integrating an odd function. Changing x to —x 
shows that “integral = — integral’. So that integral must be m = 0. 

The other two integrals apply the idea in Problem 12 to reach 1. Figure 12.2 shows 
a graph of p(x) for the normal distribution N (0,c) and also its cumulative distribution 
F(x) = integral of p(x). From the symmetry of p(x) you see mean = zero. From F(x) 
you see a very important practical approximation for opinion polling: 


The probability that a random sample falls between —o and o is F(o) — F(—0) & 2, 


This is because f p(x)dz equals f p(x)dr— f p(x)dx = F(c) — F(-o). 

Similarly, the probability that a random z lies between —2ø and 20 (“less than 
two standard deviations from the mean”) is F(20) — F(—20) ~ 0.95. If you have an 
experimental result further than 20 from the mean, it is fairly sure to be not accidental: 
chance = 0.05. Drug tests may look for a tighter confirmation, like probability 0.001. 
Searching for the Higgs boson used a hyper-strict test of 50 deviation from pure accident. 


98 + 
844 i 
p(z) F(x) = i p(x) dx 

‘|. —oo 

F(0)== 4 

(0) = 5 
164 
Pe fg i E, 

—20 —0 0 Oo 20 —20 —0 0 (0 20 


Figure 12.2: The standard normal distribution p (x) has mean m = 0 and o = 1. 


The normal distribution with any mean m and standard deviation ø comes by shifting 
and stretching the standard N (0,1). Shiftx tox — m. Stretchz — m to (x —m)/c. 


Gaussian densit 
aussian density p(x) m 1 eE m)?/20? 


(10) 
Normal distribution N(m, o) OV 2T 


The integral of p(x) is F(x)—the probability that a random sample will fall below z. 
The differential p(x) dx = F(x + dx) — F(z) is the probability that a random sample 
will fall between z and x + dz. There is no simple formula to integrate e212) so 
this cumulative distribution F(x) is computed and tabulated very carefully. 
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N Coin Flips and N —> oo 


Example 2 Suppose z is 1 or —1 with equal probabilities pı = p_1 = 2. 


The mean value is m = $(1) + $(—1) = 0. The variance is 0? = $(1)? + $(-1)? = 1. 


The key question is the average An = (zı +--:+a2y)/N. The independent z; 
are +1 and we are dividing their sum by N. The expected mean of Ay is still zero. 
The law of large numbers says that this sample average approaches zero with probability 1. 
How fast does Ay approach zero? What is its variance 72, ? 


2 2 2 2 1 
By linearity o2, = a 4 ai ee” xI = No == sineeo?=1. (1) 
Example 3 Change outputs from 1 or —1 to x = 1 or x = 0. Keep pı = po = 2. 


2 


The new mean value m = $ falls halfway between 0 and 1. The variance moves toa? = + : 


4 
1 j 1 1 ak ie 
= -(1)+ (0) = — 2—2 fles —~(Q-—-—| ==, 
m 5 (1) + 5 (9) 3 and o 5 (2 5) +5 5) A 


I il 


il 
The average Ay now has mean — and variance —- + --- + IN2 = AN = oF > (12) 


4N? 
This oy is half the size of oy in Example 2. This must be correct because the new range 
0 to 1 is half as long as —1 to 1. Examples 2-3 are showing a law of linearity. 


The new O — 1 variable znew is + zold + Z, So the mean m is increased to $ and 


one 2 

the variance is multiplied by (5) . A shift changes m and the rescaling changes o°. 
Linearity Znew = G%oaq + b has Mnew = Moa + b and 7 new = A207 1a. (13) 
Here are the results from three numerical tests: random 0 or 1 averaged over N trials. 


[48 1’s from N = 100] [5035 1’s from N = 10000] [19967 1’s from N = 40000]. 
The standardized X = (x — m)/o = (Aw — 4) /2VN was [—.40] [.70] [—.33]. 


The Central Limit Theorem says that the average of many coin flips will approach a 
normal distribution. Let us begin to see how that happens: binomial approaches normal. 


For each flip, the probability of heads is E. For N = 3 flips, the probability of heads 


all three times is a = i. The probability of heads twice and tails once is 3, 


from three sequences HHT and HTH and THH. These numbers i and 3 are pieces of 


(5 + oe = š + 3 + 3 + š = 1. The average number of heads in 3 flips is 1.5. 


1 3 3 3 6 
Mean m = (3 heads) z + (2 heads) z +41 head) = Ue a 3 + 


= = 1.5 head 
3 eads 


oo| Ww 
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With N flips, Example 3 (or common sense) gives a mean of m = X zipi = iN heads. 


The variance o° is based on the squared distance from this mean N/2. With N = 3 
the variance is o° = Š (which is N/4). To find o? we add (x; — m}? p; with m = 1.5: 


1 3 3 1 9334+49 . 3 
2=(3=1,5)? — =) 5] ZIS E ee g 
pS Bole) G15) E | 2 (0-15) - 5 


For any N, the variance is of, = N/4. Then oy = VN/2. 

Figure 12.3 shows how the probabilities of 0, 1, 2, 3, 4 heads in N = 4 flips come 
close to a bell-shaped Gaussian. That Gaussian is centered at the mean value N/2 = 2. 
To reach the standard Gaussian (mean 0 and variance 1) we shift and rescale that graph. 
If x is the number of heads in N flips—the average of N zero-one outcomes—then z is 
shifted by its mean m = N/2 and rescaled by o = VN /2 to produce the standard X : 


1 
= _1in 
Shifted andscaled 9 X = ——- = 2 (N=4 has X =a -2) 
od VN/2 


Subtracting m is “centering” or “detrending’’. The mean of X is zero. 


Dividing by ø is “normalizing” or “standardizing”. The variance of X is 1. 


PnN/2~ V 2/TN 7 ee. 


p(x) =1 / = 
uniform af binomial & 
y approaches \ M heads 
7 Gaussian \ N flips 
ii 7] N 
T = 2N L n A . 
J t + + t > 
=5 0 5 M=0 N/2 N 


Figure 12.3: The probabilities p = (1, 4,6,4,1)/16 for the number of heads in 4 flips. 
These p; approach a Gaussian distribution with variance o? = N/4 centered at m = N/2. 
For X, the Central Limit Theorem gives convergence to the normal distribution N(0, 1). 


It is fun to see the Central Limit Theorem giving the right answer at the center point 
X = 0. At that point, the factor e~* */2 equals 1. We know that the variance for N coin 
flips is o? = N/4. The center of the bell-shaped curve has height 1/210? = ,/2/Nr7. 

What is the height at the center of the coin-flip distribution po to py (the binomial 
distribution) ? For N = 4, the probabilities for 0, 1, 2,3, 4 heads come from (5 + ie 


6 1 1\f 1 4 6 4 1 
Cent bability — SS BE E BE EEE a A 
Cer ET ( + ) 16’16'16'16 16 
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The binomial theorem in Problem 8 tells us the center probability py/2 for any even N : 


1 N! 


N N ! 
The center probability (F heads, = tails) is 7N (N/2)!(N/2)! 


For N = 4, those factorials produce 4!/2! 2! = 24/4 = 6. For large N, Stirling’s formula 

V2nN(N/e)% is a close approximation to N!. Use Stirling for N and twice for N/2: 
Limit of coin-flip ee Oe a ag 
Center probability PN ON aN(N/2e)\N SYnrN [sno 


At that last step we used the variance o? = N/4 for the coin-tossing problem. The result 
1/V2n0 matches the center value (above) for the Gaussian. The Central Limit Theorem 
is true: The “binomial distribution” approaches the normal distribution as N — oo. 


Monte Carlo Estimation Methods 


Scientific computing has to work with errors in the data. Financial computing has to work 
with unsure numbers and uncertain predictions. All of applied mathematics has moved 
to accepting uncertainty in the inputs and estimating the variance in the outputs. 
How to estimate that variance? Often probability distributions p(x) are not known. 
What we can do is to try different inputs b and compute the outputs a and take an average. 
This is the simplest form of a Monte Carlo method (named after the gambling palace 
on the Riviera, where I once saw a fight about whether the bet was placed in time). 
Monte Carlo approximates an expected value E[x] by a sample average (x1 +:--+2y)/N. 


Please understand that every x, can be expensive to compute. We are not just flip- 
ping coins. Each sample comes from a set of data bg. Monte Carlo randomly chooses this 
data by, it computes the outputs xk, and then it averages those x’s. Decent accuracy for 
E[z] often requires many samples b and huge computing cost. The error in approximating 
E[2] by (v1 +---+ay)/N is normally of order 1/ VN. Slow improvement as N increases. 

That 1/ VN estimate came for coin flips in equation (11). Averaging N independent 
samples x, of variance g? reduces the variance to a?/N. 

“Quasi-Monte Carlo” can sometimes reduce this variance to o? /N? : a big difference ! 
The inputs b are selected very carefully—not just randomly. This QMC approach is 
surveyed in the journal Acta Numerica 2013. The newer idea of “Multilevel Monte Carlo” 
is outlined by Michael Giles in Acta Numerica 2015. Here is how it works. 


Suppose it is much simpler to simulate another variable y(b) close to z(b). Then use 
N computations of y(b,) and only N* < N computations of x(b;) to estimate E[z]. 


N N* 
2-level Monte Carlo E[a] = = >D y(bk) + T ` [x(bk) — y(bk)]. 


1 1 
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The idea is that x — y has a smaller variance o* than the original x. Therefore N* can 
be smaller than NV, with the same accuracy for E[z]. We do N cheap simulations to find 
the y’s. Those cost C each. We only do N* expensive simulations involving x’s. Those 
cost C* each. The total computing cost is NC + N*C*. 

Calculus minimizes the overall variance for a fixed total cost. The optimal ratio N*/N 
is \/C/C* o*/o. Three-level Monte Carlo would simulate x,y, and z : 

l N 1 N* 1 N= 

Bla] ~ 3p) 2b) + 3 2 (y(be) — z(br)] + 5 2 [e(be) — y(be)] 

Giles optimizes N, N*, N**,... to keep Efx] < fixed Eo, and provides a MATLAB code. 


Review: Three Formulas for the Mean and the Variance 


The formulas for m and g? are the starting point for all of probability and statistics. There 
are three different cases to keep straight: sample values X;, expected values (discrete p,), 
and a range of expected values (continuous p(z)). Here are the mean and the variance: 


Samples X: to Xv 


n possible outputs 
with probabilities p; 


Range of outputs 


with probability density i 


A natural question: Why are there no probabilities p on the first line? How can these 
formulas be parallel? Answer: We expect a fraction p; of the samples to be X = xi. If 
this is exactly true, X = zx; is repeated p;N times. Then lines 1 and 2 give the same m. 


When we work with samples, we don’t know the p;. We just include each output X 
as often as it comes. We get the “empirical” mean instead of the expected mean. 


Problem Set 12.1 


1 Add 7 to every output z. What happens to the mean and the variance? 
What are the new sample mean, the new expected mean, and the new variance? 


2 We know: ł of all integers are divisible by 3 and 7 of integers are divisible by 7. 


What fraction of integers will be divisible by 3 or 7 or both? 


3 Suppose you sample from the numbers 1 to 1000 with equal probabilities 1/1000. 
What are the probabilities po to pg that the last digit of your sample is 0,...,9? 
What is the expected mean m of that last digit? What is its variance o? ? 


4 Sample again from 1 to 1000 but look at the last digit of the sample squared. That 
square could end with x = 0, 1, 4,5, 6, or 9. What are the probabilities po, p1, D4, Ds, 
Pe, p9? What are the (expected) mean m and variance a? of that number x? 


12.1. 


10 


11 


12 


Mean, Variance, and Probability 545 


(a little tricky) Sample again from 1 to 1000 with equal probabilities and let x be the 
first digit (x = 1 if the number is 15). What are the probabilities pı to pg (adding 
to 1) of x = 1,...,9? What are the mean and variance of x? 


Suppose you have N = 4 samples 157, 312,696, 602 in Problem 5. What are the 
first digits xı to x4 of the squares? What is the sample mean u? What is the sample 
variance S* ? Remember to divide by N — 1 = 3 and not N = 4. 


Equation (4) gave a second equivalent form for S? (the variance using samples) : 


1 
S? = ea sum of (x; — m)? = ae [(sum of x?) — Nm?]. 
Verify the matching identity for the expected variance g? (using m = È p; xj): 
o? = sum of p; (x; — m)? = (sum of p; x?) — m?. 


If all 24 samples from a population produce the same age x = 20, what are the 
sample mean u and the sample variance S? ? What if x = 20 or 21, 12 times each? 


Computer experiment as on page 541: Find the average A1000000 Of a million 
random 0-1 samples ! What is X = (An — 5) /2/N? 


The probability p; to get i heads in N coin flips is the binomial number b; = (7) 
divided by 2^. The b; add to (1 + 1) = 2% so the probabilities p; add to 1. 


Sete le a ieee s 
oe oie ey Ra Tas Ee GONE eal 
24 24 24 1 
Kaona a e ey ea) 
eads to bo = o= = 4 b= Tig = PE= Gal ) 


Notice b; = by_;. Problem: Confirm that the mean m = Opp +: - -+N py equals = 


For any function f(x) the expected value is E[f] = X p; f (x1) or f p(x) f(x) da 
(discrete probability or continuous probability). Suppose the mean is E[z] = m and 
the variance is E[(z — m)?] = 07. What is E[x?] ? 


Show that the standard normal distribution p(x) has total o folz)dr=1 
as required. A famous trick multiplies f p(x)dx by f p(y) dy and ae the 
integral over all x and all y (—co to oo). The trick is to replace dx dy in that double 
integral by r dr d0 (polar coordinates with x? + y? = r?). Explain each step: 


CoO CoO 


CO 
2r fpa ) dx [vw r= [fe naaa f fe rardo = an. 


—oo —oo @é=0r=0 
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12.2 Covariance Matrices and Joint Probabilities 


Linear algebra enters when we run M different experiments at once. We might measure 
age and height and weight (M = 3 measurements of N people). Each experiment 
has its own mean value. So we have a vector m = (m1, M2, m3) containing the M 
mean values. Those could be sample means of age and height and weight. Or m1, m2, Mg 
could be expected values of age, height, weight based on known probabilities. 


A matrix becomes involved when we look at variances. Each experiment will have a 
sample variance S? or an expected o? = E ae — m;)”] based on the squared distance 
from its mean. Those M numbers o?,..., 0%, will go on the main diagonal of the matrix. 
So far we have made no connection between the M parallel experiments. They measure 
M different random variables, but the experiments are not necessarily independent! 

If we measure age and height and weight (a, h, w) for children, the results will be 
strongly correlated. Older children are generally taller and heavier. Suppose the means 
Ma, Mh, Mwy are known. Then Fe, oF, ao” are the separate variances in age, height, weight. 


w 
The new numbers are the covariances like can, where age multiplies height. 


Covariance Gahn = E | (age — mean age) (height — mean height)]. | (1) 


This definition needs a close look. To compute Gan, it is not enough to know the 
probability of each age and the probability of each height. We have to know the joint 
probability of each pair (age and height). This is because age is connected to height. 


Pah = probability that a random child has age = a and height = h: both at once 
Pij = probability that experiment 1 produces x; and experiment 2 produces yj 


Suppose experiment 1 (age) has mean mı. Experiment 2 (height) has mean m2. The 
covariance in (1) between experiments 1 and 2 looks at all pairs of ages x;, heights y; : 


Covariance 012 = D> >> pij(@i — ™1)(y; — M2) (2) 
TR 


To capture this idea of “joint probability p;;” we begin with two small examples. 


Example 1 Flip two coins separately. With 1 for heads and 0 for tails, the results can be 
(1,1) or (1,0) or (0,1) or (0,0). Those four outcomes all have probability p11 = pio = 
Doi = Boo = L, Independent experiments have Prob of (i, j) = (Prob of i) (Prob of 7). 


Example 2 Glue the coins together, facing the same way. The only possibilities are 
(1, 1) and (0, 0). Those have probabilities 5 and 4. The probabilities p19 and po: are zero. 
(1,0) and (0,1) won’t happen because the coins stick together: both heads or both tails. 
Probability matrices pa| Pu Pm | _ 
for Examples 1 and 2 | por poo | 


Ale Ble 
Ale Ale 
es | 
II 
SSS) 
Nile 
ne © 
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Let me stay longer with P, to show it in good matrix notation. The matrix shows the prob- 
ability p;; of each pair (x;, y;)—-starting with (x1, yi) = (heads, heads) and (x1, y2) = 
(heads, tails). Notice the row sums p; and column sums P; and the total sum = 1. 


Probability matrix P = |?! P?? Pi + Piz = pı & 
P21 P22 P21 + P22 = P2 coin 


(second coin) column sums P, P> 4 entries add to 1 
Those numbers pı, p2 and P,, P> are called the marginals of the matrix P: 


pı = pıı + pı2 = chance of heads from coin 1 (coin 2 can be heads or tails) 
P, = pi1 + p21 = chance of heads from coin 2 (coin 1 can be heads or tails) 


Example 1 showed independent variables. Every probability p;; equals p; times p; 
(5 times 5 gave pij = + in that example ). In this case the covariance o2 will be zero. 
Heads or tails from the first coin gave no information about the second coin. 


Zero covariance C12 


for independent trials 


Independent experiments have c12 = 0 because every p;; = (p;)(p;) in equation (2): 


o12=) D (0)(Ps) (em) (yj—ma)= [E e-m) [Eem] (0}[0}. 
i j 


2 j 


The glued coins show perfect correlation. Heads on one means heads on the other. 
The covariance 012 moves from 0 to 0102 = +—this is the largest possible value of a2: 


M ay ee ee ree een ee pees 
eans = — ere pee Da Zoas ag ea 
to 2 2 7 7 ni 


Heads or tails from coin 1 gives complete information about heads or tails from coin 2: 


Glued coins give largest possible covariances v. o? 0102 
Singular covariance matrix: determinant = 0 glue 0102 o2 


Always 0202 > o?,. Thus 012 is between —o\02 and 010. The covariance matrix V 
is positive definite (or in this singular case of glued coins, V is positive semidefinite). 
That is an important fact about M by M covariance matrices for M experiments. 


Note that the sample covariance matrix S from N trials is certainly semidefinite. 
Every new sample X = (age, height, weight) contributes to the sample mean X and to 5. 
Each term (X; — X) (X; — X)" is positive semidefinite and we just add to reach S: 


(Xı — X)\(X, —X)b4+---+ (Xn —X)(Xn —X)T 
N-i 


ga (3) 
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The Covariance Matrix V is Positive Semidefinite 


Come back to the expected covariance 012 between two experiments 1 and 2 (two coins): 


O12 = expected value of [(output1— mean 1) times (output2 — mean 2)] 
= 27D Pig (£i — Mı) (Yj — Mə). (4) 
all 2,3 


Pij = 0 is the probability of seeing output x; in experiment 1 and y; in experiment 2. 
Some pair of outputs must appear. Therefore the N? probabilities p;; add to 1. 


Total probability (all pairs) is 1 5 ` DiS (5) 
all. 2,3 


Here is another fact we need. Fix on one particular output x; in experiment 1. Allow 
all outputs yj in experiment 2. Add the probabilities of (x;, 41), (£j, Y2), -- -, (Zi, Yn): 


n 
Row sum p; of P >` Pi = probability p; of x; in experiment 1. (6) 
j=1 


Some y; must happen in experiment 2 ! Whether the two coins are completely separate or 
glued together, we still get $ for the probability py = ppg + ppr that coin 1 is heads: 


a | 1 1 
(separate) Pug + Pyr = l + E (glued) Pyy + Pyr = 3 +0= S 


That basic reasoning allows us to write one matrix formula that includes the covariance 
012 along with the separate variances c? and o% for experiment 1 and experiment 2. 
We get the whole covariance matrix V by adding the matrices V;; for each pair (i, 7): 


V =DDV; 


‘ ‘ apen 2 —_ —_ 
Covariance matrix FEET oy k (x; — mı) (x; — mı )(y; — M2) (7) 


all i, 4 i—m™1)(yj—me2) — (yy — m2)? 


Off the diagonal, this is equation (2) for the covariance 012. On the diagonal, we are 
getting the ordinary variances o? and o%. I will show in detail how we get Vj; = o? by 
using equation (6). Allowing all 7 just leaves the probability p; of x; in experiment 1 : 


uis Ss > Pe; - mı}? = > (probability of x;) (x; — mı)? =0?. (8) 
all 159 all 7 


Please look at that twice. It is the key to producing the whole covariance matrix by 
one formula (7). The beauty of that formula is that it combines 2 by 2 matrices V;;. 
And the matrix V;; in (7) for each pair of outcomes t, 7 is positive semidefinite : 


V;; has diagonal entries pij (x;—m)? > O and pi; (yj =m) >0 and det(V;;) =0. 
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That matrix V;; has rank 1. Equation (7) multiplies p;; times column U times row ut: 


(x; — mı)? (zi — mı) (y; = ™)) _ a ie lis =mi ye mə] (9) 
(zi — mı) (yz — M2) (yj — ma) Yj — M2 

Every matrix UUT is positive semidefinite. So the whole matrix V (combining these 

matrices UUT with weights Pij = 0) is at least semidefinite—and probably V is definite. 


The covariance matrix V is positive definite unless the experiments are dependent. 


Now we move from two variables x and y to M variables like age-height-weight. 
The output from each trial is a vector X with M components. (Each child has an age- 
height-weight vector with 3 components.) The covariance matrix V is now M by M. 
V is created from the output vectors X and their average X = E [X] : 


Covariance matrix V = E (x Bry i X (10) 


Remember that X XT and OG = (column) (row) are M by M matrices. 

For M = 1 (one variable) you see that X is the mean m and V is o? (Section 12.1). 
For M = 2 (two coins) you see that X is (mı, m2) and V matches equation (10). The 
expectation E always adds up outputs times their probabilities. For age-height-weight 
the output could be X = (5 years, 31 inches, 48 pounds) and its probability is p5,31,48 - 

Now comes a new idea. Take any linear combination c”? X = c,X, +--+: +cmXm. 
With c = (6, 2,5) this would be c" X = 6 (age) + 2 (height) + 5 (weight). By linearity 
we know that its expected value E [cT X] is cTE [X] = c? X: 


E [c! X] = cTE [X] = 6 (expected age) + 2 (expected height) + 5 (expected weight). 
More than that, we also know the variance o° of that number cT X: 

Variance of c! X = E (ctx =el X) (eX =e! X)"| 
(11) 


cTE | (X —X) (X -X)']e=cT Ve! 


Now the key point: The variance of c'X can never be negative. So c'Vc > 0. 
The covariance matrix V is therefore positive semidefinite by the energy test c'Vc > 0. 

Covariance matrices V open up the link between probability and linear algebra: 
V equals QAQ? with eigenvalues A; > 0 and orthonormal eigenvectors q, to q w- 


Diagonalizing the covariance matrix means finding M independent 
experiments as combinations of the original M experiments. 
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Confession Iam not entirely happy with that proof based on cT Vc > 0. The expectation 
symbol E is burying the key idea of joint probability. Allow me to show directly that V is 
positive semidefinite (at least for the age-height-weight example). The proof is simply that 
V is the sum of the joint probability Pahw of each combination (age, height, weight) 
times the positive semidefinite matrix UUT. Here U is X — X: 


age mean age 
V = pD Pah UUT with U = | height | — | meanheight |. (12) 
all a,h,w weight mean weight 


This is exactly like the 2 by 2 coin flip matrix V in equation (7). Now M = 3. 


The value of the expectation symbol E is that it also allows pdf’s (probability density 
functions like p(x, y, z) for continuous random variables x and y and z). If we allow all 
numbers as ages and heights and weights, instead of age 2 = 0,1,2,3..., then we need 
p(x, y, Z) instead of p;;,. The sums in this section of the book would all change to integrals. 
But we still have V = E [UU "] : 


T — 
Covariance matrix V = MESE UU” dzdydz with U = | y- (13) 
a 


NI el a| 


Always f f J p=1. Examples 1-2 emphasized how p can give diagonal V or singular V: 
Independent variables x,y,z p(z, y,z) = pi(Z) p2 (y) ps(z). 
Dependent variables x,y,z p(xz,y,z)= 0 except when cz + dy + ez = 0. 


The Mean and Variance of z = x + y 


Start with the sample mean. We have N samples of x. Their mean (= average) is mg. 
We also have N samples of y and their mean is my. The sample mean of z = x + y 
is clearly mz = mz + my: 


N N N 
1 1 1 
Mean of sum = Sum of means N J (zi + yi) = N X zi t+ 7 X yi. a4 


1 1 1 


Nice to see something that simple. The expected mean of z = x + y doesn’t look so 
simple, but it must come out as E[z] = E[x] + E[y]. Here is one way to see this. 

The joint probability of the pair (x;, y;) is pij. Its value depends on whether the exper- 
iments are independent, which we don’t know. But for the mean of the sum z = z+ y, 
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dependence or independence of x and y doesn’t matter. Expected values still add : 


Elz + y] = 2 Lal Ti + yj) = =D Diva + DD paws (15) 


All the sums go from 1 to N : We can add in any ene For the ae term on the right side, 
add the p;; along row i of the probability matrix P to get p;. That double sum gives E[z] : 


22 = > Pil T ‘+ piy) -Aa = Elz 


For the last term, add p;; down column j of the matrix to get the probability P; of y;. 
Those pairs (x1, y;) and (x2, y;) and... and (xy, y;) are all the ways to produce y; : 


2 Lipsy = Dlr + “+ PN3)¥5 = 2 Pius = Blu 


Now equation a5) says that Ele + y] = Elz] + Ely]. 
What about the variance of z = x + y? The joint probabilities p;; and the covariance 
Oxy Will be involved. Let me separate the variance of x + y into three simple pieces : 


= OD pig (ai + yy — Me — My)? 
= Py Ge ia) FP i) 2 ee ey) 


The first piece is ae. The second piece is ae The last piece is 20 zy. 


The variance of z = x +y is o? = o? - a. + 20 ny. (16) 


The Covariance Matrix for Z = AX 


Here is a good way to see o? when z = x + y. Think of (x,y) as a column vector X. 
Think of the 1 by 2 matrix A = [ 1 1 | multiplying that vector X. Then AX is the sum 


z = x + y. The variance o in equation (16) goes into matrix notation as 
2 
oF7=[1 1] | s Oxy | | ' | whichis ø? = AV AT, (17) 
Ory Oy i 


You can see that o2 = AV A? in (17) agrees with o2 + oy + 20 py in (16). 


Now for the main point. The vector X could have M components coming from M 
experiments (instead of only 2). Those experiments will have an M by M covariance 
matrix Vx. The matrix A could be K by M. Then AX is a vector with K combinations 
of the M outputs (instead of 1 combination x + y of 2 outputs). 

That vector Z = AX of length K has a K by K covariance matrix Vz. Then the 
great rule for covariance matrices—of which equation (17) was only a 1 by 2 example— 
is this beautiful formula: Covariance matrix of AX is A (covariance matrix of X) AT: 


The covariance matrix of Z = AX is Vz = AVy At (18) 


To me, this neat formula shows the beauty of matrix multiplication. I won’t prove this 
formula, just admire it. It is constantly used in applications—coming in Section 12.3. 
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The Correlation p 


Correlation pz, is closely related to covariance ogy. They both measure dependence or 
independence. Start by rescaling or “standardizing” the random variables x and y 
The new X = a/oz and Y = y/o, have variance oł = o2, = 1. This is just like 
dividing a vector v by its length to produce a unit vector v/||v|| of length 1. 


The correlation of x and y is the covariance of X and Y. If the original covariance 
of x and y was Gzy, then rescaling to X and Y will divide by a, and gy : 


o z£ 
Correlation pey = 7Y = covariance of — and ka Always —1 < pay <1 


OxOy Ox Oy 


Zero covariance gives zero correlation. Independent random variables produce pry = 0. 


We know that always o%,, < o%0% (the covariance matrix V is at least positive 
semidefinite). Then Pa < 1. Correlation near p = +1 means strong dependence in 
the same direction : often voting the same. Negative correlation means that y tends to be 


below its mean when z is above its mean: Voting in opposite directions. 


Example 3 Suppose that y is just —x. A coin flip has outputs z = 0 or 1. The same flip 
has outputs y = 0 or —1. The mean mz, is 4 for a fair coin, and m, is —4, The covariance 
iS Czy = —Oz0y. The correlation divides by ozay to get pz, = —1. In this case the 
correlation matrix f has determinant zero (singular and only semidefinite) : 


Correlation matrix R = : R= I: when y = —x 
Pay —1 1 
R always has 1’s on the diagonal because we normalized to oxy = oy = 1. Ris the 


correlation matrix for x and y, and the covariance matrix for X = x/o, and Y = y/oy. 
That number p;,, is also called the Pearson coefficient. 


Example 4 Suppose the random variables x, y, z are independent. What matrix is R? 


Answer R is the identity matrix. All three correlations pzz, Pyy, Pzz are 1 by definition. 
All three cross-correlations pzy, Pxz, Pyz are zero by independence. 


The correlation matrix R comes from the covariance matrix V, when we rescale every 
row and every column. Divide each row 7 and column 27 by the tth standard deviation o;. 


(a) R = DVD for the diagonal matrix D = diag [1/o1,...,1/oy]. 


(b) If covariance V is positive definite, correlation R = DV D is also positive definite. 
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=m WORKED EXAMPLES = 


12.2 A Suppose z and y are independent random variables with mean 0 and variance 1. 
Then the covariance matrix Vx for X = (x,y) is the 2 by 2 identity matrix. What are the 
mean m z and the covariance matrix Vz for the 3-component vector Z = (x, y,ax + by)? 


Solution 


T 1-0 
Z is connected to X by A Z= y =| 0 1 |? | -4x. 
ax + by a b Y 


The vector m x contains the means of the M components of X. The vector mz contains 
the means of the K components of Z = AX. The matrix connection between the means 
of X and Z has to be linear: mz = Amx. The mean of az + by is amy + bmy. 

The covariance matrix for Z is Vz = AAT, when Vx is the 2 by 2 identity matrix: 


r : 1 0 1 0 a 
V covariance matrix for 01 | 1 0 a | 01 b 
Zo _ — — 
Z = (x,y, ax + by) a 0 1 b ees 
Interpretation: x and y are independent so oz, = 0. Then the covariance of x with 


ax + by is a and the covariance of y with az + by is b. Those just come from the two 
independent parts of ax + by. Finally, equation (18) gives the variance of az + by: 


Use Vz = AVxAT Oo epby = an + Oby + 2Fax,by = a? +b? +0. 


The 3 by 3 matrix Vz is singular. Its determinant is a? + b? — a? — b? = 0. The third 
component z = az + by is completely dependent on x and y. The rank of Vz is only 2. 


GPS Example The signal from a GPS satellite includes its departure time. The receiver 
clock gives the arrival time. The receiver multiplies the travel time by the speed of light. 
Then it knows the distance from that satellite. Distances from four or more satellites 
pinpoint the receiver position (using least squares !). 

One problem: The speed of light changes in the ionosphere. But the correction 
will be almost the same for all nearby receivers. If one receiver stays in a known position, 
we can take differences from that position. Differential GPS reduces the error variance: 


Difference matrix Covariance matrix Dai ] o? o 1 
A=[1 —1] Vz = AVxAT o2 oF —1 


= 0? — 2012 + 02 


Errors in the speed of light are gone. Then centimeter positioning accuracy is achievable. 
(The key ideas are on page 320 of Algorithms for Global Positioning by Borre and Strang.) 
The GPS world is all about time and space and amazing accuracy. 
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Problem Set 12.2 


1 (a) Compute the variance o? when the coin flip probabilities are p and 1 — p 
(tails = 0, heads = 1). 
(b) The sum of N independent flips (0 or 1) is the count of heads after N tries. 
The rule (16-17-18) for the variance of a sum gives o? = 


2 What is the covariance op, between the results 71,...,2,, of Experiment 3 and the 
results y1,..., Yn Of Experiment 5? Your formula will look like o12 in equation (2). 
Then the (3, 5) and (5, 3) entries of the covariance matrix V are 035 = 053. 


3 For M = 3 experiments, the variance-covariance matrix V will be 3 by 3. There 
will be a probability pijẹ that the three outputs are x; and y; and z. Write down a 
formula like equation (7) for the matrix V. 


4 What is the covariance matrix V for M = 3 independent experiments with means 


Mı, M2, Mmg and variances o?, 03,03? 


Problems 5-9 are about the conditional probability that Y = y; when we know X = z;. 


Notation: Prob (Y = y;|X = z;) = probability of the outcome y; given that X = z;. 


Example 1 Coin 1 is glued to coin 2. Then Prob(Y = heads when X = heads) is 1. 
Example 2 Independent coin flips: X gives no information about Y. Useless to know X. 
Then Prob (Y = heads |X = heads) is the same as Prob (Y = heads). 


5 Explain the sum rule of conditional probability : 
Prob (Y = y;) = sum over all outputs x; of Prob (Y = y;|X = 2;). 
6 The n by n matrix P contains joint probabilities p;; = Prob (X =z; and Y = y;). 
Pij =x Pig 
Pii t'et Pin Di 
7 For this joint probability matrix with Prob (x1, y2) =0.3, find Prob (y2|x1) and Prob (z1). 


pa | Pir Piz |- it 0:3 The entries p;; add to 1. 
p21 p22 0.2 0.4 Some 7, 7 must happen. 


Explain why the conditional Prob (Y = y,;|X = 2;) equals 


8 Explain the product rule of conditional probability: 
Pij = Prob (X =a; and Y = yj) equals Prob (Y = y;j|X = 2;) times Prob (X = z;). 


9 Derive this Bayes Theorem for p;; from the product rule in Problem 8: 


_ Prob(X = ra Y = y) Prob (Y = y;) 


Prob (Y = y; and X = rp ————_ OEE E, 
top es _ Prob (x = #;) 


“Bayesians” use prior information. “Frequentists” only use sampling information. 
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12.3 Multivariate Gaussian and Weighted Least Squares 


The normal probability density p(x) (the Gaussian) depends on only two numbers: 


e-(x@—m)?/20?_ (4) 


Mean m and variance o? plz) = 


The graph of p(x) is a bell-shaped curve centered at x = m. The continuous variable x 


can be anywhere between —oo and oo. With probability close to 2, that random z will lie 


between m — o and m + ø (less than one standard deviation o from its mean value m). 


oO m +o i 1 
J p(x\)dgz=1 and (0) dz = -= i gx Ran, (2) 
2T 3 
=o m — o —1 


That integral has a change of variables from z to X = (x — m)/o. This simplifies the 
exponent to —X?/2 and it simplifies the limits of integration to —1 and 1. Even the 1/0 
from p disappears outside the integral because dX equals dx/o. Every Gaussian turns 
into a standard Gaussian p(X ) with mean m = 0 and variance o° = 1. Just call it p(x) : 


1 2 
The standard normal distribution N (0,1) has p(x) = —— e`? [2 (3) 


Van 


Integrating p(x) from —oo to x gives the cumulative distribution F(x): the probability 
that a random sample is below x. That probability will be F = at x = 0 (the mean). 


Two-dimensional Gaussians 


Now we have M = 2 Gaussian random variables x and y. They have means m; and mg. 
They have variances g? and g2. If they are independent, then their probability density 
p(x, y) is just py (a) times p2(y). Multiply probabilities when variables are independent: 


e` (2 — m1)”/201 e—(y — M2)*/203 (4) 


Independent x andy p(z, y) = 
270102 

The covariance of x and y will be c12 = 0. The covariance matrix V will be diago- 
nal. The variances c? and g2 are always on the main diagonal of V. The exponent in 
p(x, y) is just the sum of the z-exponent and the y-exponent. Good to notice that the two 


exponents can be combined into —ż (a — m)? V~! (a — m) with V~! in the middle: 


2 2 


= = 1 2 —1 _ 
a = E a” =- om ga K | ; a = 
1 2 
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Non-independent x and y 


We are ready to give up independence. The exponent (5) with V~! is still correct when V is 
no longer a diagonal matrix. Now the Gaussian depends on a vector m and a matrix V. 


When M = 2, the first variable x may give partial information about the second 
variable y (and vice versa). Maybe part of y is decided by x and part is truly independent. 
It is the M by M covariance matrix V that accounts for dependencies between the M 
variables x = £1,..., £m. Its inverse V7! goes into p(x): 


Multivariate Gaussian 7 Í (e-m) V em) 
probability distribution p(x) = (Van) /det V e (6) 


The vectors æ = (x1,..., £m) and m = (mı, ..., mz) contain the random variables and 
their means. The M square roots of 27 and the determinant of V are included to make the 
total probability equal to 1. Let me check that by linear algebra. I use the eigenvalues À and 
orthonormal eigenvectors q of the symmetric matrix V = QAQT. SoV-1 = QATIQT: 


X=a-m  (a—m)'V—-l(e—m) = XTQA“1QTX =Y'A“1Y 


Notice! The combinations Y = QTX = QT(ax — m) are statistically independent. 
Their covariance matrix A is diagonal. 

This step of diagonalizing V by its eigenvector matrix Q is the same as “uncorrelating” 
the random variables. Covariances are zero for the new variables X1,... Xm. This is the 
point where linear algebra helps calculus to compute multidimensional integrals. 


The integral of p(x) is not changed when we center the variable x by subtracting m 
to reach X, and rotate that variable to reach Y = QTX . The matrix A is diagonal ! 
So the integral we want splits into M separate one-dimensional integrals that we know : 


Zoa dY = [even dy, |... [ othe? dym 


= (VIM) (Vam) = (V) vaa. T 


The determinant of V (also the determinant of A) is the product (A1)... (Am) of 
the eigenvalues. Then (7) gives the correct number to divide by so that p(z1,..., £m) 
in equation (6) has integral = 1 as desired. 

The mean and variance of p(a) are also M-dimensional integrals. The same idea of 
diagonalizing V by its eigenvectors and introducing Y = Q? X will find those integrals: 


Vector m of means J [eve da = (mimo; =m (8) 
Covariance matrix V J E J (x — m)pl(x)(x —- m)” dz =V. (9) 


Conclusion: Formula (6) for the probability density p(x) has all the properties we want. 
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Weighted Least Squares 


In Chapter 4, least squares started from an unsolvable system Ax = b. We chose Ẹ to 
minimize the error ||b — Aa||?. That led us to the least squares equation AT A® = ATb. 
The best AZ is the projection of b onto the column space of A. But is this squared 
distance E = ||b — Ax||? the right error measure to minimize ? 

If the measurement errors in b are independent random variables, with mean m = 0 
and variance g? = 1 and a normal distribution, Gauss would say yes: Use least squares. 
If the errors are not independent or their variances are not equal. Gauss would say no: 
Use weighted least squares. This section will show that the good measure of error is 
E = (b — Ax)'V—1(b— Az). The equation for the best £ uses the covariance matrix V : 


Weighted least squares AtV—1Az = ATV~—1b. (10) 


The most important examples have m independent errors in b. Those errors have 


variances o%,...,02,. By independence, V is a diagonal matrix. The good weights 
1/02, ..., 1/02, come from V~+. We are weighting the errors in b to have variance = 1: 
ighted | te m (b— Aw)? 
Weighted least squares Minimize S (b— Ax); (11) 
Independent errors in b = A 


By weighting the errors, we are “whitening” the noise. White noise is a quick description 
of independent errors based on the standard Gaussian N(0, 1) with mean zero and o? = 1. 
Let me write down the steps to equations (10) and (11) for the best £ : 


Start with Ax =b (m equations, n unknowns, m > n, no solution) 
Each right side b; has mean zero and variance o?. The b; are independent. 
Divide the ith equation by g; to have variance = 1 for every b;/0; 

That division turns Ax = b into V72 Aw = V—1/2b with V—1/2 = diag (1/01,...,1/om) 


Ordinary least squares on those weighted equations has A — V~!/?.4 and b + V~1/26 
(VV? ATV? Ae = (V-V?A)TV-Y?2b is ATYA = ATV- (12) 


Because of 1/0? in V~!, more reliable equations (smaller o) get heavier weights. This is 
the main point of weighted least squares. 


Those diagonal weightings (uncoupled equations) are the most frequent and the sim- 
plest. They apply to independent errors in the b;. When these measurement errors are not 
independent, V is no longer diagonal—but (12) is still the correct weighted equation. 

In practice, finding all the covariances can be serious work. Diagonal V is simpler. 
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The Variance in the Estimated £ 


One more point: Often the important question is not the best & for one particular set of 
measurements b. This is only one sample! The real goal is to know the reliability of the 
whole experiment. That is measured (as reliability always is) by the variance in the 
estimate £. First, zero mean in b gives zero mean in £. Then the formula connecting 
variance V in the inputs b to variance W in the outputs Z turns out to be beautiful : 


Variance-covariance matrix W for 2 E|(# — æx)(@ — x)"] = (ATV—1A)-?. (13) 


That smallest possible variance comes from the best possible weighting, which is V~?. 
This key formula is a perfect application of Section 12.2. If b has covariance matrix 

V, then & = Lb has covariance matrix LV LT. Equation (12) above tells us that L is 

(ATV-1A)-! ATV—!. Now substitute this into LV LT and watch equation (13) appear : 
LVL™ = (AVA) ATV} V OV A(ATV 1A)“ = (ATV—-1A)? 
This is the covariance W of the output, our best estimate £. It is time for examples. 


Example 1 Suppose a doctor measures your heart rate x three times (m = 3,n = 1): 


=b, ll a. 0 

g=bs is AR=b with A=) 1 and V = 0 as 0 

x = bg 1 OG A) p 
The variances could be o? - /9 ando% = 1/4 and o = 1. You are getting more nervous 
as measurements are taken: ; is less reliable than bə and bı. All three measurements 


contain some information, so they all go into the best (weighted) estimate & : 


sc = 3b1 
V2 4%@=V—/2b is 2g = 2bo leadingto ATV—1A% = ATV—1b 
Les 1b3 
ee a eee: 1 (3119 by 
4 1)/z= 4 bo 
1 1 1 b3 


9b; + 4b b 
T= Siret’ is a weighted average of b1 , b2, b3 
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Most weight is on b; since its variance gı is smallest. The variance of £ has the beautiful 
formula W = (ATV IA) ts 1/14: 


=1 


fii 1) )2 1 1 1 
Variance of < 4 1 = — issmallerthan — 
1 1 14 9 


The BLUE theorem of Gauss (proved on the website) says that our £ = Lb is the best 
linear unbiased estimate of the solution to Ax = b. Any other unbiased choice x* = L*b 
has greater variance than £. All unbiased choices have L* A = IJ so that an exact Ax = b 
will produce the right answer x = L*b = L* Az. 


Note. I must add that there are reasons not to minimize squared errors in the first place. 
One reason: This Z often has many small components. The squares of small numbers 
are very small, and they appear when we minimize. It is easier to make sense of sparse 
vectors—only a few nonzeros. Statisticians often prefer to minimize unsquared errors : 
the sum of |(b — As);|. This error measure is L! instead of L?. Because of the 
absolute values, the equation for & becomes nonlinear (it is actually piecewise linear). 

Fast new algorithms are computing a sparse & quickly and the future may belong to Lt. 


The Kalman Filter 


The “Kalman filter” is the great algorithm in dynamic least squares. That word dynamic 
means that new measurements b; keep coming. So the best estimate %, keeps changing 
(based on all of bo,..., bx). More than that, the matrix A is also changing. So £ will be 
our best least squares estimate of the latest solution x; to the whole history of observation 
equations and update equations (state equations) up to time 2: 


Apo = bo Tı = Foxo Aix} = bı T2 = Fixi Aox2 = bo (14) 


The Kalman idea is to introduce one equation at a time. There will be errors in each 
equation. With every new equation, we update the best estimate Z+ for the current £. But 
history is not forgotten! This new estimate £ uses all the past observations bo to b,_; and 
all the state equations Znew = Fold Loa. A large and growing least squares problem. 


One more important point. Each least squares equation is weighted using the 
covariance matrix V; for the error in by. There is even a covariance matrix Ck for 
errors in the update equations £k+1 = Fkzp. The best £2 then depends on bo, b1, b2 and 
Vo, Vi, V2 and C1, C2. The good way to write ©, is as an update to the previous %,_1. 

Let me concentrate on a simplified problem, without the matrices F and the covari- 
ances Cp. We are estimating the same true x at every step. How do we get 2; from £o ? 


OLD Ao Zo = bo leads to the weighted equation Ag Vo Ao £o = AG vo bo. (15) 


NEW p | ips i | leads to the following weighted equation for £1 : 
1 1 
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=j -1 
A AE alfaja E E lsh 


Yes, we could just solve that new problem and forget the old one. But the old solution £o 
needed work that we hope to reuse in £1. What we look for is an update to £o : 


Kalman update gives x, from Zo Tı = To + Kı (bı — Ay Zo). (17) 


The update correction is the mismatch b; — A1 Zo between the old state Zo and the new 
measurements b;—multiplied by the Kalman gain matrix Kı. The formula for Kı comes 
from comparing the solutions ı and Zo to (15) and (16). And when we update £o to 71 
based on new data b;, we also update the covariance matrix Wọ to W1. Remember 
Wo = (AT Vy + Ao)~! from equation (13). Update its inverse to W,*: 


Covariance W, of errors in 7; Wi 1- wg + + At y Ai (18) 
Kalman gain matrix Kı Kı = W, At l (19) 


This is the heart of the Kalman filter. Notice the importance of the Wg. Those matrices 
measure the reliability of the whole process, where the vector 2; estimates the current state 
based on the particular measurements bo to bx. 

Whole chapters and whole books are written to explain the dynamic Kalman filter, 
when the states x; are also changing (based on the matrices Fx). There is a prediction of 
£p using F, followed by a correction using the new data b. Perhaps best to stop here. 

This page was about recursive least squares: adding new data b; and updating both 
T and W : the best current estimate based on all the data, and its covariance matrix. 


Problem Set 12.3 


1 Two measurements of the same variable x give two equations x = bı and z = by. 
Suppose the means are zero and the variances are o? and of, with independent 
errors: V is diagonal with entries o? and of. Write the two equations as Ax = b 
(A is 2 by 1). As in the text Example 1, find this best estimate £ based on b; and b2: 


b/a? +bafos tote 1 ic 
ga uei tho E[2@"] = (4+) : 
1/oz + 1/03 Oy, 9 
2 (a) In Problem 1, suppose the second measurement bz becomes super-exact and its 


variance 02 — 0. What is the best estimate £ when gə reaches zero? 


(b) The opposite case has 02 — œ and no information in b2. What is now the best 
estimate Z based on bı and bz ? 
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3 If x and y are independent with probabilities pı(x) and po(y), then p(x, y) = 
p1(x) po(y). By separating double integrals into products of single integrals 
(—co to oo) show that 


[[rerdaeay=1 and [e+ uh ply) de dy = m1 + ma. 


4 Continue Problem 3 for independent z, y to show that p(x, y) = pı (x£) p2 (y) has 


[[te- mi rle,y) de dy= 03 f(e- m) - m) ple: y) dedy = 0. 


So the 2 by 2 covariance matrix V is diagonal and its entries are ; 


5 Show that the inverse of a 2 by 2 covariance matrix V is 
> 
Vole g? 012 cc l 1/0? —p/cı102 | with correlation 
oa of = 1=ø@ | —p/0102 Ia; pP = 012/0102. 


This produces the exponent —(a — m)T V~! (æ — m) in a 2-variable Gaussian. 


6 Suppose T is the average of b1,..., bx. A new measurement b,+1 arrives and we 
want the new average 7,41. The Kalman update equation (17) is 


(bk+1 — Tk). 


1 
New average T =z —— 
8 Cell = y k+1 | 


Verify that £4, is the correct average of b,..., bp+41. i 


7 Also check the update equation (18) for the variance W,41 = o7/(k + 1) of this 
2 


average © assuming that Wp = o? /k and by+1 has variance V = g^. 
8 (Steady model) Problems 6-7 were static least squares. All the sample averages 
Tk were estimates of the same x. To make the Kalman filter dynamic, include also 
a state equation xp41 = Fzp with its own error variance s*. The dynamic least 
squares problem allows z to “drift” as k increases: : 


1 n bo g? 
—F 1 = | = | 0 | with variances | 8? 
1 1 bı g? 


With F = 1, divide both sides of those three equations by c, s, and o. Find To 
and x; by least squares, which gives more weight to the recent bı. The Kalman 
filter is developed in Algorithms for Global Positioning (Borre and Strang, Wellesley- 
Cambridge Press). 
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Change in A~! from a Change in A 


This final page connects the beginning of the book (inverses and rank one matrices) with 
the end of the book (dynamic least squares and filters). Begin with this basic formula: 


The inverse of M =I — uv? is M+ = I + 


T 
The quickest proof is MM! =] — uvT + (1 — uv?) EEN = I — wv? + uvt =] 
— vtu 
M is not invertible if vTu = 1 (then Mu = 0). Here vf = ul = [ i a TEL | 
ba i 1 al 
Example The inverse of M =I-—| 111 | is M~+=I+————J| 111 
il z IRI 


But we don’t always start from the identity matrix. Many applications need to invert 


M = A — uv". After we solve Ax = b we expect a rank one change to give My = b. 


The division by 1 — vTu above will become a division by c = 1 — vT A7!w = 1 — vtz. 


Step 1 Solve Az = wand compute c = 1 — v” z. 


vTr 


Step 2 Ifc Æ 0 then M~'bis y = x + — z. 
G 


Suppose A is easy to work with. A might already be factored into LU by elimination. 
Then this Sherman-Woodbury-Morrison formula is the fast way to solve My = b. 
Here are three problems to end the book ! 


9 Take Steps 1- 2to find y when A = J and uf = vT = |1 2 3] and b? = [2 1 4]. 


T 
10 Step 2in this “update formula” claims that My = (A — uv”) (« F ua z) =b. 
C 


T 
Uv 2 TE : 
[1 — c — vT z] = 0. This is true since c = 1 — vfz. 


Simplify this to 
11 When A has a new row vT, ATA in the least squares equation changes to M : 
A 
M = [ AT v ] | T | = ATA + vv” = rank one change in AT A. 
v 


Why is that multiplication correct? The updated Znew comes from Steps 1 and 2. 


For reference here are four formulas for M+. The first two were given above, when the 
change was uv?. Formulas 3 and 4 go beyond rank one to allow matrices U,V, W. 


1 M=I-w! and M-t=J+4+uvt/(1—vlu) ( rankl change) 
2 M=A= uv" and M7!=A-!4A-tuvtAt/(1-— vt Au) 
3 M=I-UV and M-1=1,+U(Im—VU)1V 


4 M=A-UW"!V and M! = A`! + AlU(W — YA UVA! 


Formula 4 is the “matrix inversion lemma” in engineering. Not seen until now! 
The Kalman filter for solving block tridiagonal systems uses formula 4 at each step. 


MATRIX FACTORIZATIONS | 


I. A=LU= ( lower triangular L ) ( upper triangular U ) 


1’s on the diagonal pivots on the diagonal 


Requirements: No row exchanges as Gaussian elimination reduces square A to U. 


2 A=LDU= ( lower triangular L ) ( pivot matrix ) ( upper triangular U ; 


1’s on the diagonal D is diagonal 1’s on the diagonal 
Requirements: No row exchanges. The pivots in D are divided out to leave 1’s on 
the diagonal of U. If A is symmetric then U is LT and A = LDLT. | 

3. PA = LU (permutation matrix P to avoid zeros in the pivot positions). 
Requirements: A is invertible. Then P, L,U are invertible. P does all of the ( 
row exchanges on A in advance, to allow normal LU. Alternative: A = Lı PiU. 

4. EA = R (m by m invertible E) (any m by n matrix A) = rref(A). 
Requirements: None ! The reduced row echelon form R has r pivot rows and pivot | 
columns, containing the identity matrix. The last m — r rows of E are a basis for 
the left nullspace of A; they multiply A to give m — r zero rows in R. The first r | 
columns of E~? are a basis for the column space of A. 

5. S = CTC = (lower triangular) (upper triangular) with vD on both diagonals 
Requirements: S is symmetric and positive definite (all n pivots in D are positive). 
This Cholesky factorization C = chol(S) has CT = LVD, so S = CTC = LDL". 

6. A = QR = (orthonormal columns in Q) (upper triangular R). i 
Requirements: A has independent columns. Those are orthogonalized in Q by the I 
Gram-Schmidt or Householder process. If A is square then Q7! = QT. 

7. A= XAX—1 =(eigenvectors in X) (eigenvalues in A) (left eigenvectorsin X—'). i 


Requirements: A must have n linearly independent eigenvectors. i 
8. S = QAQ? = (orthogonal matrix Q) (real eigenvalue matrix A) (QT is Q71). i 


Requirements: S is real and symmetric: ST = S. This is the Spectral Theorem. | 
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10. 


11. 


12. 


13. 


14. 


15. 


Matrix Factorizations 


A = BJB~! = (generalized eigenvectors in B) (Jordan blocks in J) (B71). 


Requirements: A is any square matrix. This Jordan form J has a block for each 
independent eigenvector of A. Every block has only one eigenvalue. 


01,---,0,r on its diagonal Visnxn 


A=usvt=( : 
Uismxm 


Requirements: None. This Singular Value Decomposition (SVD) has the eigenvec- 


tors of AAT in U and eigenvectors of AT A in V; o; = \/A;(AT A) = VA AAT). 


Those singular values are 0, > o2 > --: > or > 0. By column-row multiplication 
A=USVT= O1 UV, feet OrUpv. 


If S is symmetric positive definite then U = V = Q and E = A and S = QAQ’. 


orthogonal n x m pseudoinverse of X orthogonal 
nxn 1/o1,...,1/o, on diagonal mxm j’ 


AF =V STS ( 


Requirements: None. The pseudoinverse At has At A = projection onto row space 
of A and AAT = projection onto column space. At = A`! if A is invertible. The 
shortest least-squares solution to Aw = bis xt = Atb. This solves AT Axt = AT. 
A = QS = (orthogonal matrix Q) (symmetric positive definite matrix S’). 
Requirements: A is invertible. This polar decomposition has S? = ATA. The 
factor S is semidefinite if A is singular. The reverse polar decomposition A = KQ 
has K? = AAT. Both have Q = UVT from the SVD. 

A =UAU~—? = (unitary U) (eigenvalue matrix A) (U~! which is UH = y 
Requirements: A is normal: AĦA = AA". Its orthonormal (and possibly complex) 
eigenvectors are the columns of U. Complex \’s unless S = S: Hermitian case. 
A = QTQ! = (unitary Q) (triangular T with A’s on diagonal) (Q71! = Q#). 


Requirements: Schur triangularization of any square A. There is a matrix Q with 
orthonormal columns that makes QT! AQ triangular: Section 6.4. 


F, = i A k 2 F | es | = one step of the recursive FFT. 
n/2 


I -D permutation 
Requirements: F, = Fourier matrix with entries wi* where w” = 1: Fa Fn = nI. 
D has 1,w,... w”? -1 on its diagonal. For n = 2° the Fast Fourier Transform 


will compute F pæ with only inl = in log, n multiplications from £ stages of D’s. 


orthogonal ) p x n singular value a ( orthogonal ) 
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Integral, 404, 413, 545 

Integration by parts, 122 

Interior point method, 488 
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Matrix inversion lemma, 562 

Matrix multiplication, 58, 62, 70, 414 
Matrix powers, 74, 80 

Matrix space, 125, 126, 171, 172, 178, 409 
Max = min, 485 
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Nonnegative matrix, 479 
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Orthogonal matrix, 234, 241, 242, 295, 494 
Orthogonal subspaces, 195, 196, 203 
Orthogonal vectors, 194, 233, 430 
Orthonormal basis, 371, 492 

Orthonormal columns, 234, 236, 441 
Orthonormal eigenvectors, 338, 348 
Orthonormal vectors, 233, 237 

Outer product (see columns times rows), 81 
Output basis, 411, 412, 413 


p 

P-value, 385 

PageRank, 388 

Parabola, 226, 227, 464 
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Schur, 343, 363 

Schur complement, 75, 96, 270, 357 
Schwarz inequality, 11, 16, 20, 393, 490 
Scree plot, 389 

Second derivative matrix, 356, 361 
Second difference, 344, 357, 464 
Second eigenvalue, 477 

Second order equation, 322, 333 
Semidefinite matrix, 354 
Sensitivity, 478, 482 
Sherman-Woodbury-Morrison, 562 
Shift by wo, 402 


571 


Short wide matrix, 139, 171 

Shortage of eigenvectors, 329 
Shortest solution, 225, 397, 400 
Sigma notation, 59 

Signal processing, 435, 445, 450 
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Wave equation, 330 
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Web matrix, 387 

Weight function, 426 
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Index of Symbols and Computer Codes 


A= LDU,99 

A = LU, 99, 114, 378 

A = QR, 239, 240, 378 
A= QS and KQ, 394 

A = UEVT, 372, 378 

A = uwvT, 140 

A= BCB™!, 308 

A= BJB—!, 422, 423 

A = QR, 239, 513, 530, 532 
A = QTQ7!, 343 

A= XAX7!, 304, 310 
A* = XA*X—!, 307, 310 
At = VEtUT, 395 
ATA, 112, 203, 212, 372 
AT AZ = ATb, 219 
ATCA, 362, 459, 467 


P= A(AT A) AP, 211 
PA=LU,114 
QTQ = I, 234 


R = rref(A), 137 
S = ATA, 352, 372 
S = LDL", 342 
S = QAQT, 338, 341, 353 
r 326, 328, 334 
= Xe^t X1, 327 
(a- ADe = = 0, 292 
Ax)? y = xT (ATy), 111 
AB)? = BT at, 110 


AB)`™t = 5 mer a fa 
(AB)C = wee 
[A b] and [A J], a 
det(A — AL) = 0, 292, 293 
C(A) and C(A‘), 128 
IN(A) and NAT). 135 
C”, 430, 444 
R”, 123, 430 
SUT, 134 
S + T, 134, 179 
SOT, 133,179 
V+, 197, 204 
Z, 123; 125, 137, 173 
£' and 2°, 523 
1, J, k, 13, 169, 280 
u X v, 279 
at = At b, 397 
N(O, 1), 555 
mod p, 502, 503 
NaN, 225 
—1, 2, —1 matrix, 259, 368, 
523 
3 by 3 determinant, 271 


Computer Packages 
ARPACK, 531 
BLAS, 509 


chebfun, 428 

Fortran, 39 

Julia, 16, 38, 39 

LAPACK, 100, 378, 509, 
515, 529 

Maple, 38 

Mathematica, 38 

MATLAB, 16, 38, 43, 88, 
115, 240, 303 

MINRES, 528 

Python, 16, 38, 39 

R, 38, 39 


Code Names 


amd, 513 

chol, 353 

eig, 293 

eigshow, 303, 380 
lu, 103 

norm, 17, 392,518 
pascal, 95 
plot2d, 406, 410 
qr, 241, 246 
rand, 370 

rref, 88, 137 

svd, 378 

toeplitz, 108 
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Six Great Theorems / Linear Algebra in a Nutshell 


Six Great Theorems of Linear Algebra 


Dimension Theorem All bases for a vector space have the same number of vectors. 
Counting Theorem Dimension of column space + dimension of nullspace = number of columns. 
Rank Theorem Dimension of column space = dimension of row space. This is the rank. 
Fundamental Theorem The row space and nullspace of A are orthogonal complements in R”. 
SVD There are orthonormal bases (v’s and w’s for the row and column spaces) so that Av; = Ciui. 


Spectral Theorem If A? = A there are orthonormal q’s so that Aq; = Aig; and A = QAQ™. 


LINEAR ALGEBRA IN A NUTSHELL 


(( The matrix A is n by n)) 


Nonsingular 

A is invertible 

The columns are independent 

The rows are independent 

The determinant is not zero 

Aa =0 has one solution x =0 

Az =bhas one solution x= A~!b 
A has n (nonzero) pivots 

A has full rank r=n 

The reduced row echelon form is R= I 
The column space is all of R” 

The row space is all of R” 

All eigenvalues are nonzero 

AT A is symmetric positive definite 
A has n (positive) singular values 


Singular 

A is not invertible 

The columns are dependent 

The rows are dependent 

The determinant is zero 

Aa = 0 has infinitely many solutions 
Aa =b has no solution or infinitely many 
A has r < n pivots 

A has rank r < n 

R has at least one zero row 

The column space has dimension r < n 
The row space has dimension r < n 
Zero is an eigenvalue of A 

AT A is only semidefinite 

A hasr < n singular values 


